How to load JSON file with LangChain
Quick answer
Use LangChain's
TextLoader or custom loaders to read JSON files by loading the file content as text or parsing JSON into documents. You can then process these documents with LangChain chains or vector stores.PREREQUISITES
Python 3.8+pip install langchain>=0.2 openai>=1.0OpenAI API key (free tier works)
Setup
Install LangChain and OpenAI Python SDK, and set your OpenAI API key as an environment variable.
pip install langchain openai
# Set environment variable in your shell
# export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] # Linux/macOS
# setx OPENAI_API_KEY os.environ["OPENAI_API_KEY"] # Windows Step by step
Load a JSON file by reading it as text, then create LangChain documents from the parsed JSON content.
import os
import json
from langchain.schema import Document
# Path to your JSON file
json_file_path = "data.json"
# Load JSON content
with open(json_file_path, "r", encoding="utf-8") as f:
data = json.load(f)
# Convert JSON data to string or extract relevant fields
json_text = json.dumps(data, indent=2)
# Create a LangChain Document
documents = [Document(page_content=json_text)]
# Print the document content
print(documents[0].page_content) output
{
"key1": "value1",
"key2": 123,
"key3": ["item1", "item2"]
} Common variations
You can customize loading by parsing JSON objects into multiple Document instances or use async file reading for large files.
import os
import json
from langchain.schema import Document
# Load JSON file and create multiple documents if JSON is a list
with open("data_list.json", "r", encoding="utf-8") as f:
data_list = json.load(f)
# Assume data_list is a list of dicts
documents = [Document(page_content=json.dumps(item)) for item in data_list]
for doc in documents:
print(doc.page_content)
# Async example (Python 3.8+ with aiofiles)
import aiofiles
import asyncio
async def load_json_async(path):
async with aiofiles.open(path, mode='r', encoding='utf-8') as f:
content = await f.read()
return json.loads(content)
async def main():
data = await load_json_async("data.json")
print(data)
# asyncio.run(main()) # Uncomment to run async example output
{...} # Prints each JSON object as a string Troubleshooting
- If you see
FileNotFoundError, verify the JSON file path is correct. - If JSON parsing fails, check the file content for valid JSON syntax.
- For encoding errors, ensure the file is UTF-8 encoded.
Key Takeaways
- Use Python's built-in
jsonmodule to parse JSON files before creating LangChainDocumentobjects. - LangChain does not have a dedicated JSON loader, so convert JSON content to text or multiple documents manually.
- For large JSON files, consider async file reading or splitting JSON arrays into multiple documents for better processing.