How to use Whisper with LangChain
Quick answer
Use the
openai Python SDK to transcribe audio with the whisper-1 model, then integrate the transcription into a LangChain pipeline using the AudioTranscriptionChain or a custom chain. This enables automated audio-to-text workflows within LangChain.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 langchain>=0.2
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install packages:
pip install openai langchain - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai langchain Step by step
This example shows how to transcribe an audio file using OpenAI's Whisper model via the openai SDK, then use LangChain to process the transcription text.
import os
from openai import OpenAI
from langchain.chains import SimpleSequentialChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI as LangChainOpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Transcribe audio file with Whisper
with open("audio.mp3", "rb") as audio_file:
transcript_response = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
transcription = transcript_response.text
print("Transcription:", transcription)
# Use LangChain to process transcription (e.g., summarize)
prompt = PromptTemplate(
input_variables=["text"],
template="Summarize the following transcription:\n{text}"
)
llm = LangChainOpenAI(model_name="gpt-4o-mini", temperature=0)
class TranscriptionChain(SimpleSequentialChain):
def __init__(self):
super().__init__(chains=[], input_key="text", output_key="summary")
summary = llm(prompt.format(text=transcription))
print("Summary:", summary) output
Transcription: Hello, this is a sample audio transcription from Whisper. Summary: This is a sample audio transcription.
Common variations
- Use async calls with
asyncioandawaitfor transcription and LangChain. - Stream transcription results if supported by your SDK version.
- Use LangChain's
AudioTranscriptionChainfor more integrated audio workflows. - Swap
gpt-4o-miniwith other OpenAI models for different LLM capabilities.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - For file errors, ensure the audio file path is correct and the file format is supported (mp3, wav, m4a, etc.).
- If transcription is slow, check your network connection and API rate limits.
- Update
openaiandlangchainpackages regularly to avoid compatibility issues.
Key Takeaways
- Use OpenAI's
whisper-1model via theopenaiSDK for accurate audio transcription. - Integrate Whisper transcription results into LangChain pipelines for automated audio processing workflows.
- Set environment variables and install the latest
openaiandlangchainpackages to ensure compatibility.