Code Intermediate medium · 6 min

TextLoader and CSVLoader: plain text and tabular data

What you will learn

Load plain text files and CSV data into LangChain documents for processing by LLMs.

Why this matters

Real-world RAG and document processing pipelines depend on reliable file ingestion; TextLoader and CSVLoader are the standard entry points for feeding unstructured text and structured tabular data into LangChain workflows without manual parsing.

Skip if: Don't use TextLoader if you need to process files larger than available memory (use streaming loaders instead), and don't use CSVLoader if your CSV contains binary data, complex nested structures, or requires custom parsing logic beyond standard CSV formatting.

Explanation

TextLoader and CSVLoader are LangChain document loaders that convert raw file content into Document objects: a standardized format that LLMs and vector databases can consume. TextLoader reads a single text file and wraps its content in a Document with metadata, while CSVLoader parses CSV rows and creates one Document per row by default, making tabular data queryable. Both return a list of Document objects with a page_content field (the text) and a metadata dict (filename, source, row index, etc.). Mechanically, they read the file, parse it, and construct Document instances; CSVLoader additionally handles column mapping and can join columns into a single text representation. Use TextLoader for logs, articles, and raw text; use CSVLoader for database dumps, spreadsheets, and structured data that benefits from row-level document splitting.

Analogy

TextLoader is like scanning a book page-by-page into a digital format; CSVLoader is like converting a spreadsheet into individual index cards, one row per card, so you can shuffle and retrieve them independently.

Code

python

import os
from langchain_community.document_loaders import TextLoader, CSVLoader

text_file = '/tmp/sample.txt'
csv_file = '/tmp/data.csv'

with open(text_file, 'w') as f:
    f.write('The quick brown fox jumps over the lazy dog.\n')
    f.write('This is a second line of text.')

with open(csv_file, 'w') as f:
    f.write('name,age,city\n')
    f.write('Alice,30,New York\n')
    f.write('Bob,25,San Francisco\n')

text_loader = TextLoader(text_file)
text_docs = text_loader.load()

print('=== TextLoader Output ===')
for doc in text_docs:
    print(f'Content:\n{doc.page_content}')
    print(f'Metadata: {doc.metadata}\n')

csv_loader = CSVLoader(file_path=csv_file)
csv_docs = csv_loader.load()

print('=== CSVLoader Output ===')
for doc in csv_docs:
    print(f'Content:\n{doc.page_content}')
    print(f'Metadata: {doc.metadata}\n')

Output

=== TextLoader Output ===
Content:
The quick brown fox jumps over the lazy dog.
This is a second line of text.
Metadata: {'source': '/tmp/sample.txt'}

=== CSVLoader Output ===
Content:
name: Alice
age: 30
city: New York
Metadata: {'source': '/tmp/data.csv', 'row': 0}

Content:
name: Bob
age: 25
city: San Francisco
Metadata: {'source': '/tmp/data.csv', 'row': 1}

What just happened?

TextLoader read the entire text file as a single document with the file path stored in metadata. CSVLoader parsed the CSV, created one Document per data row (skipping the header), converted each row into key-value text format ('name: Alice\nage: 30\n...'), and stored the row index in metadata. Both returned lists of Document objects ready for vector storage or LLM input.

Common gotcha

CSVLoader creates one document per row by default: if your CSV has 10,000 rows, you get 10,000 documents, which explodes memory and token costs. Most developers don't realize they need to use CSVLoader(csv_file, source_column='your_id') to control which column becomes the document ID, or pre-process the CSV into chunks before loading.

Error recovery

FileNotFoundError

TextLoader and CSVLoader require the file path to exist. Check the path is absolute or relative from your working directory. Use <code>os.path.abspath()</code> to debug.

encoding errors (UnicodeDecodeError)

Files may be encoded in UTF-16 or Latin-1, not UTF-8. Pass <code>encoding='utf-8'</code> (or the correct encoding) to the loader: <code>TextLoader(file, encoding='latin-1')</code>. CSVLoader accepts <code>encoding</code> as a kwarg.

ImportError on CSVLoader

Ensure langchain-community is installed: <code>pip install langchain-community</code>. CSVLoader is not in langchain-core.

Experienced dev note

CSVLoader by default joins all columns into a single text field per row: this is rarely what you want for structured data retrieval. In production, consider using CSVLoader(..., csv_args={'delimiter': ','}) and inspect the output shape before feeding millions of rows to a vector database. Also, TextLoader loads the entire file into memory: for files >100MB, use async loaders or stream-based alternatives. Finally, always validate metadata is being set correctly; it's your lifeline for tracing where a retrieved document came from.

Check your understanding

You have a CSV with 5,000 product rows and want to store them in a vector database for semantic search. How would you verify that CSVLoader isn't creating memory issues, and what would you change if each row is too large to fit efficiently?

Show answer hint

A correct answer would mention: checking the length of the documents list (should equal row count), measuring memory with <code>sys.getsizeof()</code> on the docs list, considering filtering/selecting only relevant CSV columns before loading, or chunking rows into groups rather than one-document-per-row, and using metadata to track original row IDs for retrieval.

VERSION TextLoader and CSVLoader have been stable in langchain-community >= 0.0.1. No breaking changes in 1.2.x. However, in langchain < 0.1.0, these were in langchain.document_loaders; import from langchain_community.document_loaders in 1.2.x.

Next, explore how to chain loaders with text splitters using <code>load_and_split()</code> to break large documents into manageable chunks for embedding and retrieval.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.