AI for grading and assessment
Quick answer
Use
large language models (LLMs) like gpt-4o to automate grading by providing student answers and rubrics as prompts. The model can score, provide feedback, and assess open-ended responses with high accuracy and consistency.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to grade a short student answer against a rubric using gpt-4o. The prompt includes the question, student response, and grading criteria. The model returns a score and feedback.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
question = "Explain the water cycle in your own words."
student_answer = "Water evaporates from lakes, forms clouds, and then falls as rain."
rubric = "Score 0-5 based on completeness and accuracy. Provide brief feedback."
prompt = f"Question: {question}\nStudent answer: {student_answer}\nRubric: {rubric}\nGrade and feedback:"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content) output
Score: 4/5 Feedback: Good explanation covering evaporation, condensation, and precipitation. Could mention collection for full completeness.
Common variations
- Use
gpt-4o-minifor faster, cheaper grading with slightly less detail. - Implement async calls with
asynciofor batch grading. - Use streaming to display partial feedback as the model generates.
- Incorporate multiple student answers in a single prompt for comparative grading.
import asyncio
import os
from openai import OpenAI
async def grade_answer(question, answer, rubric):
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = f"Question: {question}\nStudent answer: {answer}\nRubric: {rubric}\nGrade and feedback:"
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
question = "Explain photosynthesis."
answer = "Plants use sunlight to make food from carbon dioxide and water."
rubric = "Score 0-5 based on accuracy and detail."
result = await grade_answer(question, answer, rubric)
print(result)
asyncio.run(main()) output
Score: 5/5 Feedback: Accurate and concise explanation of photosynthesis.
Troubleshooting
- If you get
401 Unauthorized, verify yourOPENAI_API_KEYenvironment variable is set correctly. - If responses are incomplete, increase
max_tokensin the API call. - For inconsistent grading, refine your rubric prompt for clarity and specificity.
- Use
temperature=0to reduce randomness and improve grading consistency.
Key Takeaways
- Use
gpt-4oorgpt-4o-minifor automated grading with prompt-based rubrics. - Include clear grading criteria in prompts to improve scoring accuracy and feedback quality.
- Async and streaming API calls enable scalable and interactive grading workflows.
- Set
temperature=0to ensure consistent and deterministic grading results.