Concept Beginner · 3 min read

What is encoder only model in AI

Quick answer
An encoder only model in AI is a neural network architecture that processes input data into a fixed representation without generating output sequences. It focuses solely on encoding input features, commonly used in tasks like classification and embedding generation.
Encoder only model is a neural network architecture that encodes input data into fixed representations without decoding or generating output sequences.

How it works

An encoder only model works by transforming input data into a dense vector representation that captures its essential features. Think of it like a translator who listens to a sentence and summarizes its meaning into a compact code without trying to speak back. This contrasts with encoder-decoder models, which both encode input and generate output sequences.

In practice, these models use layers of self-attention and feed-forward networks to build contextual embeddings of the input tokens. The output is a fixed-size vector or sequence of vectors representing the input's semantic content.

Concrete example

Here is a simple example using the Hugging Face transformers library to load an encoder only model like bert-base-uncased and encode a sentence into embeddings:

python
from transformers import BertTokenizer, BertModel
import torch

# Load pretrained tokenizer and encoder only model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Input text
text = "Encoder only models create fixed representations."
inputs = tokenizer(text, return_tensors='pt')

# Forward pass to get last hidden states (embeddings)
with torch.no_grad():
    outputs = model(**inputs)

# outputs.last_hidden_state shape: (batch_size, sequence_length, hidden_size)
embeddings = outputs.last_hidden_state
print(embeddings.shape)  # torch.Size([1, 8, 768])
output
torch.Size([1, 8, 768])

When to use it

Use encoder only models when your task requires understanding or classifying input data without generating new text. Common use cases include:

  • Text classification (e.g., sentiment analysis)
  • Feature extraction for downstream tasks
  • Embedding generation for similarity search
  • Named entity recognition and token-level classification

Do not use encoder only models when you need to generate or translate text sequences; for that, use encoder-decoder or decoder-only models.

Key terms

TermDefinition
Encoder only modelA model architecture that encodes input into fixed representations without generating output sequences.
Self-attentionA mechanism allowing the model to weigh the importance of different input tokens relative to each other.
EmbeddingA dense vector representation capturing semantic information of input data.
Encoder-decoder modelA model architecture that encodes input and then decodes it to generate output sequences.
BERTA popular encoder only transformer model used for various NLP tasks.

Key Takeaways

  • Encoder only models transform input data into fixed embeddings without generating output text.
  • They excel at classification, feature extraction, and embedding generation tasks.
  • Use encoder only models like BERT when you need deep understanding but no text generation.
  • Encoder-decoder or decoder-only models are better suited for text generation tasks.
  • Self-attention layers enable encoder only models to capture contextual relationships in input.
Verified 2026-04 · bert-base-uncased
Verify ↗