How to beginner · 3 min read

How to load GitHub repository with LangChain

Quick answer
Use LangChain's GitLoader or GitHubRepoLoader classes to load documents directly from a GitHub repository. These loaders clone or fetch files from the repo and convert them into LangChain documents for further processing.

PREREQUISITES

  • Python 3.8+
  • pip install langchain>=0.2.0
  • Git installed on your system
  • OpenAI API key (optional for downstream tasks)

Setup

Install LangChain and ensure git is installed on your system to clone repositories. Set your OpenAI API key as an environment variable if you plan to use LangChain with OpenAI models.

bash
pip install langchain>=0.2.0

Step by step

This example uses LangChain's GitLoader to clone a GitHub repository and load its files as documents.

python
from langchain_community.document_loaders import GitLoader
import os

# Clone and load the GitHub repo
repo_url = "https://github.com/hwchase17/langchain"
loader = GitLoader(repo_path="./langchain_repo", clone_url=repo_url)
docs = loader.load()

# Print the first document's content
print(docs[0].page_content[:500])
output
'''# langchain/__init__.py
"""LangChain is a framework for developing applications powered by language models."""

from langchain.schema import *  # noqa
from langchain.chains import *  # noqa
from langchain.llms import *  # noqa
from langchain.prompts import *  # noqa
from langchain.vectorstores import *  # noqa
from langchain.embeddings import *  # noqa
from langchain.document_loaders import *  # noqa

__version__ = "0.2.0"
'''

Common variations

You can use GitHubRepoLoader for more control, such as loading specific file types or branches. Async loading and integration with LangChain chains for LLM processing are also common.

python
from langchain_community.document_loaders import GitHubRepoLoader

loader = GitHubRepoLoader(
    repo_url="https://github.com/hwchase17/langchain",
    branch="main",
    file_filter=lambda file_path: file_path.endswith('.py')
)
docs = loader.load()
print(f"Loaded {len(docs)} Python files from the repo.")
output
Loaded 50 Python files from the repo.

Troubleshooting

  • If cloning fails, ensure git is installed and accessible in your system PATH.
  • Check your internet connection and repository URL for typos.
  • For private repos, configure SSH keys or use authentication tokens.

Key Takeaways

  • Use LangChain's GitLoader or GitHubRepoLoader to load GitHub repos as documents.
  • Ensure git is installed and repo URLs are correct to avoid cloning errors.
  • Filter files by extension or branch for targeted loading.
  • Set environment variables for API keys and authentication securely.
  • Loaded documents can be used directly with LangChain chains and LLMs.
Verified 2026-04
Verify ↗