How to beginner · 3 min read

Fix LLM giving wrong classifications

Quick answer
To fix wrong classifications from an LLM, use clear, specific prompts with examples and set the temperature to a low value like 0 for deterministic output. Use the OpenAI SDK's chat.completions.create method with a classification prompt and validate outputs programmatically.

Setup

Install the official openai Python package and set your API key as an environment variable for secure authentication.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use a clear prompt with explicit instructions and examples to guide the LLM's classification. Set temperature=0 to reduce randomness and improve accuracy. Parse the response to extract the classification.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "Classify the following text into categories: Positive, Negative, or Neutral.\n"
    "Text: 'I love this product!'\nClassification: Positive\n"
    "Text: 'This is the worst experience.'\nClassification: Negative\n"
    "Text: 'The item is okay, nothing special.'\nClassification: Neutral\n"
    "Text: 'The service was disappointing.'\nClassification:"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    temperature=0,
    max_tokens=10
)

classification = response.choices[0].message.content.strip()
print(f"Classification: {classification}")
output
Classification: Negative

Common variations

  • Use temperature between 0 and 0.3 for slightly more creative but still reliable classifications.
  • Try different models like gpt-4o-mini for faster, cheaper inference.
  • Use async calls with asyncio for concurrent classification requests.
python
import os
import asyncio
from openai import OpenAI

async def classify_text(text: str):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = f"Classify the following text as Positive, Negative, or Neutral:\nText: '{text}'\nClassification:"
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,
        max_tokens=10
    )
    return response.choices[0].message.content.strip()

async def main():
    texts = [
        "I love this product!",
        "The service was disappointing.",
        "It's okay, nothing special."
    ]
    results = await asyncio.gather(*(classify_text(t) for t in texts))
    for text, classification in zip(texts, results):
        print(f"Text: {text}\nClassification: {classification}\n")

asyncio.run(main())
output
Text: I love this product!
Classification: Positive

Text: The service was disappointing.
Classification: Negative

Text: It's okay, nothing special.
Classification: Neutral

Troubleshooting

  • If classifications are inconsistent, lower temperature to 0 for deterministic output.
  • Ensure prompts are explicit with examples to reduce ambiguity.
  • Check for token limits; increase max_tokens if classification output is truncated.
  • Validate and normalize output strings programmatically to handle unexpected responses.

Key Takeaways

  • Use explicit prompts with examples to guide LLM classification.
  • Set temperature to 0 for consistent, deterministic results.
  • Validate and parse LLM output to handle unexpected classifications.
Verified 2026-04 · gpt-4o-mini
Verify ↗