How to load YouTube transcript with LangChain
Quick answer
Use LangChain's
YouTubeLoader class to load transcripts from YouTube videos by providing the video URL. This loader fetches the transcript automatically and returns it as documents ready for processing with LangChain.PREREQUISITES
Python 3.8+pip install langchain>=0.2.0pip install youtube-transcript-apipip install requestsOpenAI API key (for downstream usage)
Setup
Install the required packages including langchain and youtube-transcript-api which LangChain uses internally to fetch YouTube transcripts.
pip install langchain youtube-transcript-api requests Step by step
Use LangChain's YouTubeLoader to load the transcript from a YouTube video URL. The loader returns a list of Document objects containing the transcript text.
from langchain_community.document_loaders import YouTubeLoader
# Replace with your YouTube video URL
video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
loader = YouTubeLoader(video_url)
docs = loader.load()
for doc in docs:
print(doc.page_content[:500]) # print first 500 chars of transcript output
We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy ...
Common variations
- Use
YouTubeLoaderwithlanguageparameter to specify transcript language if available. - Combine with LangChain text splitting or embeddings for downstream NLP tasks.
- Use async loading by running
loader.aload()in an async context.
import asyncio
from langchain_community.document_loaders import YouTubeLoader
async def load_async():
loader = YouTubeLoader("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
docs = await loader.aload()
for doc in docs:
print(doc.page_content[:300])
asyncio.run(load_async()) output
We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy ...
Troubleshooting
- If you get
TranscriptDisablederror, the video has no transcript available or it is disabled. - Ensure the video URL is correct and publicly accessible.
- Install
youtube-transcript-apiseparately if LangChain loader fails to import it.
Key Takeaways
- Use LangChain's
YouTubeLoaderto fetch YouTube transcripts easily by URL. - Install
youtube-transcript-apias it is required for transcript fetching. - You can load transcripts synchronously or asynchronously with LangChain.
- Handle errors like missing transcripts by checking video availability and permissions.