Fix LLM giving wrong classifications
Quick answer
To fix wrong classifications from an LLM, use clear, specific prompts with examples and set the temperature to a low value like 0 for deterministic output. Use the OpenAI SDK's chat.completions.create method with a classification prompt and validate outputs programmatically.
Setup
Install the official openai Python package and set your API key as an environment variable for secure authentication.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Use a clear prompt with explicit instructions and examples to guide the LLM's classification. Set temperature=0 to reduce randomness and improve accuracy. Parse the response to extract the classification.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = (
"Classify the following text into categories: Positive, Negative, or Neutral.\n"
"Text: 'I love this product!'\nClassification: Positive\n"
"Text: 'This is the worst experience.'\nClassification: Negative\n"
"Text: 'The item is okay, nothing special.'\nClassification: Neutral\n"
"Text: 'The service was disappointing.'\nClassification:"
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0,
max_tokens=10
)
classification = response.choices[0].message.content.strip()
print(f"Classification: {classification}") output
Classification: Negative
Common variations
- Use
temperaturebetween 0 and 0.3 for slightly more creative but still reliable classifications. - Try different models like
gpt-4o-minifor faster, cheaper inference. - Use async calls with
asynciofor concurrent classification requests.
import os
import asyncio
from openai import OpenAI
async def classify_text(text: str):
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = f"Classify the following text as Positive, Negative, or Neutral:\nText: '{text}'\nClassification:"
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.1,
max_tokens=10
)
return response.choices[0].message.content.strip()
async def main():
texts = [
"I love this product!",
"The service was disappointing.",
"It's okay, nothing special."
]
results = await asyncio.gather(*(classify_text(t) for t in texts))
for text, classification in zip(texts, results):
print(f"Text: {text}\nClassification: {classification}\n")
asyncio.run(main()) output
Text: I love this product! Classification: Positive Text: The service was disappointing. Classification: Negative Text: It's okay, nothing special. Classification: Neutral
Troubleshooting
- If classifications are inconsistent, lower
temperatureto0for deterministic output. - Ensure prompts are explicit with examples to reduce ambiguity.
- Check for token limits; increase
max_tokensif classification output is truncated. - Validate and normalize output strings programmatically to handle unexpected responses.
Key Takeaways
- Use explicit prompts with examples to guide LLM classification.
- Set temperature to 0 for consistent, deterministic results.
- Validate and parse LLM output to handle unexpected classifications.