How to beginner · 3 min read

Replicate Cog explained

Q: Replicate Cog explained

The Replicate Cog is a Python SDK that enables running AI models locally or remotely via Replicate's platform using a simple interface. You install it with pip install replicate, then run models by calling replicate.run() with the model name and inputs, receiving outputs directly in Python.

Quick answer

The Replicate Cog is a Python SDK that enables running AI models locally or remotely via Replicate's platform using a simple interface. You install it with pip install replicate, then run models by calling replicate.run() with the model name and inputs, receiving outputs directly in Python.

PREREQUISITES

Python 3.8+
pip install replicate
Replicate API token set as REPLICATE_API_TOKEN environment variable

Setup

Install the replicate Python package and set your API token as an environment variable to authenticate with Replicate's API.

bash

pip install replicate

# On Linux/macOS
export REPLICATE_API_TOKEN="your_token_here"

# On Windows (PowerShell)
setx REPLICATE_API_TOKEN "your_token_here"

output

Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (20 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

# No output for environment variable set command

Step by step

Use the replicate.run() function to run a model by specifying its name and input parameters. The output is returned as Python data.

python

import os
import replicate

# Ensure your REPLICATE_API_TOKEN is set in environment

# Run the stable diffusion model to generate an image from a prompt
output = replicate.run(
    "stability-ai/stable-diffusion:db21e45d73e3a2b4a8a6f0a1e1f5a7b3d9f9f7a3f9a1b2c3d4e5f6g7h8i9j0k",
    input={"prompt": "A futuristic cityscape at sunset"}
)

print("Output URL:", output)

output

Output URL: https://replicate.delivery/pbxt/abc123def456ghi789jkl0/image.png

Common variations

You can run different models by changing the model name string. Async usage is supported with replicate.async_run(). For image models, outputs are usually URLs; for text models, outputs are strings or JSON.

python

import asyncio
import replicate

async def main():
    output = await replicate.async_run(
        "meta/meta-llama-3-8b-instruct",
        input={"prompt": "Explain RAG in AI"}
    )
    print("Async output:", output)

asyncio.run(main())

output

Async output: RAG (Retrieval-Augmented Generation) is a technique that combines retrieval of documents with generative models to improve accuracy and relevance.

Troubleshooting

If you get authentication errors, verify your REPLICATE_API_TOKEN environment variable is set correctly.
For model not found errors, check the exact model name and version string on replicate.com.
Network timeouts may require retry logic or checking your internet connection.

✅

Key Takeaways

Use pip install replicate and set REPLICATE_API_TOKEN to authenticate.
Run models with replicate.run(model_name, input=...) for simple synchronous inference.
Async inference is supported via replicate.async_run() for non-blocking calls.
Model outputs vary by type: images return URLs, text returns strings or JSON.
Check model names and tokens carefully to avoid common errors.

Verified 2026-04 · stability-ai/stable-diffusion, meta/meta-llama-3-8b-instruct

Verify ↗