How to Intermediate · 3 min read

How to manage prompts in production

Quick answer
Manage prompts in production by using version control systems like Git to track changes, employing prompt templating libraries such as Jinja2 for dynamic content, and implementing automated testing to validate prompt outputs. This approach ensures maintainability, consistency, and easier iteration of LLM prompts.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install jinja2

Setup

Install necessary packages and set environment variables for API access. Use pip to install the openai and jinja2 libraries for prompt management and templating.

bash
pip install openai>=1.0 jinja2
output
Collecting openai
Collecting jinja2
Successfully installed openai-1.x.x jinja2-3.x.x

Step by step

Use Jinja2 templates to create dynamic prompts, store them in version control, and call the OpenAI API with rendered prompts. This example shows a simple prompt template and how to render it with variables before sending to the model.

python
import os
from openai import OpenAI
from jinja2 import Template

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define a prompt template with placeholders
prompt_template = """
You are a helpful assistant.
Answer the question:
{{ question }}
"""

# Render the prompt with dynamic content
template = Template(prompt_template)
prompt = template.render(question="What is RAG in AI?")

# Call the chat completion API
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

print("Response:", response.choices[0].message.content)
output
Response: Retrieval-Augmented Generation (RAG) is a technique that combines retrieval of relevant documents with generative models to produce accurate and context-aware answers.

Common variations

You can manage prompts asynchronously, use streaming responses for real-time output, or switch models easily by parameterizing the model name. Additionally, integrate prompt testing frameworks to validate outputs before deployment.

python
import asyncio
from openai import OpenAI
from jinja2 import Template

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_prompt(question: str, model: str = "gpt-4o-mini") -> str:
    prompt_template = "You are a helpful assistant. Answer: {{ question }}"
    template = Template(prompt_template)
    prompt = template.render(question=question)

    response = await client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )

    collected = []
    async for chunk in response:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
        collected.append(delta)
    print()
    return ''.join(collected)

asyncio.run(async_prompt("Explain prompt management."))
output
You are a helpful assistant. Answer: Prompt management in production involves versioning, templating, testing, and monitoring to ensure consistent and reliable AI outputs.

Troubleshooting

  • If prompts produce inconsistent outputs, verify template variables are correctly rendered before API calls.
  • If API errors occur, check your OPENAI_API_KEY environment variable is set and valid.
  • For latency issues, consider caching prompt templates and responses.

Key Takeaways

  • Use version control like Git to track prompt changes and enable collaboration.
  • Employ templating engines such as Jinja2 to create dynamic, reusable prompts.
  • Implement automated tests to validate prompt outputs and catch regressions early.
  • Parameterize model names and prompt variables for flexible production deployments.
  • Monitor prompt performance and errors to maintain reliability in production.
Verified 2026-04 · gpt-4o-mini
Verify ↗