How to Intermediate · 4 min read

AI for adaptive testing

Quick answer
Use large language models (LLMs) like gpt-4o to dynamically generate and select test questions based on a learner's responses, enabling adaptive testing. By analyzing answers in real-time, AI can adjust difficulty and personalize assessments for better accuracy and engagement.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates a simple adaptive testing loop using gpt-4o. The AI generates questions, evaluates answers, and adjusts difficulty accordingly.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Initial difficulty level
difficulty = 1

# Simple adaptive testing loop
for i in range(5):
    prompt = f"Generate a math question at difficulty level {difficulty} with 4 multiple choice answers."
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    question = response.choices[0].message.content
    print(f"Question {i+1}:\n{question}\n")

    # Simulate user answer input (replace with real input in practice)
    user_answer = input("Your answer: ")

    # Ask AI to evaluate the answer correctness
    eval_prompt = f"Question: {question}\nUser answer: {user_answer}\nIs the answer correct? Reply with 'Correct' or 'Incorrect'."
    eval_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    evaluation = eval_response.choices[0].message.content.strip()
    print(f"Evaluation: {evaluation}\n")

    # Adjust difficulty based on correctness
    if evaluation.lower() == "correct":
        difficulty += 1
    else:
        difficulty = max(1, difficulty - 1)

print("Adaptive test complete.")
output
Question 1:
Generate a math question at difficulty level 1 with 4 multiple choice answers.

Your answer: 2
Evaluation: Correct

Question 2:
Generate a math question at difficulty level 2 with 4 multiple choice answers.

Your answer: 3
Evaluation: Incorrect

Question 3:
Generate a math question at difficulty level 1 with 4 multiple choice answers.

Your answer: 1
Evaluation: Correct

Question 4:
Generate a math question at difficulty level 2 with 4 multiple choice answers.

Your answer: 4
Evaluation: Correct

Question 5:
Generate a math question at difficulty level 3 with 4 multiple choice answers.

Your answer: 2
Evaluation: Incorrect

Adaptive test complete.

Common variations

You can implement asynchronous calls with asyncio for better performance or use different models like claude-3-5-sonnet-20241022 for varied style and reasoning. Streaming responses allow real-time question generation display.

python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def adaptive_test():
    difficulty = 1
    for i in range(3):
        prompt = f"Generate a science question at difficulty level {difficulty} with 4 multiple choice answers."
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        question = response.choices[0].message.content
        print(f"Question {i+1}:\n{question}\n")

        # Simulate user input
        user_answer = input("Your answer: ")

        eval_prompt = f"Question: {question}\nUser answer: {user_answer}\nIs the answer correct? Reply with 'Correct' or 'Incorrect'."
        eval_response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": eval_prompt}]
        )
        evaluation = eval_response.choices[0].message.content.strip()
        print(f"Evaluation: {evaluation}\n")

        if evaluation.lower() == "correct":
            difficulty += 1
        else:
            difficulty = max(1, difficulty - 1)

asyncio.run(adaptive_test())
output
Question 1:
Generate a science question at difficulty level 1 with 4 multiple choice answers.

Your answer: B
Evaluation: Correct

Question 2:
Generate a science question at difficulty level 2 with 4 multiple choice answers.

Your answer: A
Evaluation: Incorrect

Question 3:
Generate a science question at difficulty level 1 with 4 multiple choice answers.

Your answer: C
Evaluation: Correct

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If responses are slow, try using smaller models like gpt-4o-mini or enable streaming to improve responsiveness.
  • Ensure your input format matches the expected chat message structure to avoid API errors.

Key Takeaways

  • Use LLMs like gpt-4o to generate and evaluate adaptive test questions dynamically.
  • Adjust question difficulty in real-time based on AI evaluation of user answers.
  • Implement async and streaming for scalable, responsive adaptive testing applications.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗