How to beginner · 4 min read

AI for grading and assessment

Q: AI for grading and assessment

Use large language models (LLMs) like gpt-4o to automate grading by providing student answers and rubrics as prompts. The model can score, provide feedback, and assess open-ended responses with high accuracy and consistency.

Quick answer

Use large language models (LLMs) like gpt-4o to automate grading by providing student answers and rubrics as prompts. The model can score, provide feedback, and assess open-ended responses with high accuracy and consistency.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to grade a short student answer against a rubric using gpt-4o. The prompt includes the question, student response, and grading criteria. The model returns a score and feedback.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

question = "Explain the water cycle in your own words."
student_answer = "Water evaporates from lakes, forms clouds, and then falls as rain."
rubric = "Score 0-5 based on completeness and accuracy. Provide brief feedback."

prompt = f"Question: {question}\nStudent answer: {student_answer}\nRubric: {rubric}\nGrade and feedback:" 

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print(response.choices[0].message.content)

output

Score: 4/5
Feedback: Good explanation covering evaporation, condensation, and precipitation. Could mention collection for full completeness.

Common variations

Use gpt-4o-mini for faster, cheaper grading with slightly less detail.
Implement async calls with asyncio for batch grading.
Use streaming to display partial feedback as the model generates.
Incorporate multiple student answers in a single prompt for comparative grading.

python

import asyncio
import os
from openai import OpenAI

async def grade_answer(question, answer, rubric):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = f"Question: {question}\nStudent answer: {answer}\nRubric: {rubric}\nGrade and feedback:"
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    question = "Explain photosynthesis."
    answer = "Plants use sunlight to make food from carbon dioxide and water."
    rubric = "Score 0-5 based on accuracy and detail."
    result = await grade_answer(question, answer, rubric)
    print(result)

asyncio.run(main())

output

Score: 5/5
Feedback: Accurate and concise explanation of photosynthesis.

Troubleshooting

If you get 401 Unauthorized, verify your OPENAI_API_KEY environment variable is set correctly.
If responses are incomplete, increase max_tokens in the API call.
For inconsistent grading, refine your rubric prompt for clarity and specificity.
Use temperature=0 to reduce randomness and improve grading consistency.

✅

Key Takeaways

Use gpt-4o or gpt-4o-mini for automated grading with prompt-based rubrics.
Include clear grading criteria in prompts to improve scoring accuracy and feedback quality.
Async and streaming API calls enable scalable and interactive grading workflows.
Set temperature=0 to ensure consistent and deterministic grading results.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗