How to beginner · 3 min read

Replicate Cog explained

Quick answer
The Replicate Cog is a Python SDK that enables running AI models locally or remotely via Replicate's platform using a simple interface. You install it with pip install replicate, then run models by calling replicate.run() with the model name and inputs, receiving outputs directly in Python.

PREREQUISITES

  • Python 3.8+
  • pip install replicate
  • Replicate API token set as REPLICATE_API_TOKEN environment variable

Setup

Install the replicate Python package and set your API token as an environment variable to authenticate with Replicate's API.

bash
pip install replicate

# On Linux/macOS
export REPLICATE_API_TOKEN="your_token_here"

# On Windows (PowerShell)
setx REPLICATE_API_TOKEN "your_token_here"
output
Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (20 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

# No output for environment variable set command

Step by step

Use the replicate.run() function to run a model by specifying its name and input parameters. The output is returned as Python data.

python
import os
import replicate

# Ensure your REPLICATE_API_TOKEN is set in environment

# Run the stable diffusion model to generate an image from a prompt
output = replicate.run(
    "stability-ai/stable-diffusion:db21e45d73e3a2b4a8a6f0a1e1f5a7b3d9f9f7a3f9a1b2c3d4e5f6g7h8i9j0k",
    input={"prompt": "A futuristic cityscape at sunset"}
)

print("Output URL:", output)
output
Output URL: https://replicate.delivery/pbxt/abc123def456ghi789jkl0/image.png

Common variations

You can run different models by changing the model name string. Async usage is supported with replicate.async_run(). For image models, outputs are usually URLs; for text models, outputs are strings or JSON.

python
import asyncio
import replicate

async def main():
    output = await replicate.async_run(
        "meta/meta-llama-3-8b-instruct",
        input={"prompt": "Explain RAG in AI"}
    )
    print("Async output:", output)

asyncio.run(main())
output
Async output: RAG (Retrieval-Augmented Generation) is a technique that combines retrieval of documents with generative models to improve accuracy and relevance.

Troubleshooting

  • If you get authentication errors, verify your REPLICATE_API_TOKEN environment variable is set correctly.
  • For model not found errors, check the exact model name and version string on replicate.com.
  • Network timeouts may require retry logic or checking your internet connection.

Key Takeaways

  • Use pip install replicate and set REPLICATE_API_TOKEN to authenticate.
  • Run models with replicate.run(model_name, input=...) for simple synchronous inference.
  • Async inference is supported via replicate.async_run() for non-blocking calls.
  • Model outputs vary by type: images return URLs, text returns strings or JSON.
  • Check model names and tokens carefully to avoid common errors.
Verified 2026-04 · stability-ai/stable-diffusion, meta/meta-llama-3-8b-instruct
Verify ↗