High severity intermediate · Fix: 5-10 min

DuplicateDocumentError

haystack.document_stores.base.DuplicateDocumentError

What this error means
Haystack raises DuplicateDocumentError when attempting to write documents that violate the configured duplicate write policy in the document store.

Stack trace

traceback
Traceback (most recent call last):
  File "app.py", line 42, in <module>
    document_store.write_documents(docs)
  File "/usr/local/lib/python3.9/site-packages/haystack/document_stores/base.py", line 210, in write_documents
    raise DuplicateDocumentError("Duplicate document detected based on write policy")
haystack.document_stores.base.DuplicateDocumentError: Duplicate document detected based on write policy
QUICK FIX
Set the document store's duplicate write policy to 'overwrite' or 'ignore' to prevent DuplicateDocumentError on existing documents.

Why it happens

Haystack document stores enforce a write policy to prevent duplicate documents based on document IDs or content hashes. When you try to write documents that already exist and the policy forbids duplicates, this error is raised. This protects data integrity but requires careful handling of document IDs and write policies.

Detection

Monitor exceptions during document_store.write_documents calls and log DuplicateDocumentError occurrences along with document IDs to identify duplicates before crashing.

Causes & fixes

1

Writing documents with IDs that already exist in the document store and the write policy is set to 'fail'.

✓ Fix

Change the write policy to 'overwrite' or 'ignore' in the document store configuration, or ensure new documents have unique IDs.

2

Document content duplicates detected when the store uses content hashing to identify duplicates.

✓ Fix

Modify documents to have unique content or disable content-based duplicate detection if appropriate.

3

Multiple parallel writes causing race conditions leading to duplicate document insert attempts.

✓ Fix

Implement write synchronization or retry logic to avoid concurrent writes of the same document.

Code: broken vs fixed

Broken - triggers the error
python
from haystack.document_stores import FAISSDocumentStore

store = FAISSDocumentStore()
docs = [{"id": "doc1", "content": "text"}, {"id": "doc1", "content": "text"}]
store.write_documents(docs)  # This line raises DuplicateDocumentError
Fixed - works correctly
python
from haystack.document_stores import FAISSDocumentStore

store = FAISSDocumentStore(duplicate_documents='overwrite')  # Changed write policy to overwrite

docs = [{"id": "doc1", "content": "text"}, {"id": "doc1", "content": "text"}]
store.write_documents(docs)  # Now overwrites duplicates without error
print("Documents written successfully")
Changed the document store's duplicate_documents parameter to 'overwrite' so duplicate document writes update existing entries instead of raising errors.

Workaround

Catch DuplicateDocumentError exceptions around write_documents calls, then filter out or rename duplicate documents before retrying the write operation.

Prevention

Design your document ingestion pipeline to assign unique IDs and choose an appropriate duplicate write policy ('overwrite' or 'ignore') to avoid duplicate write errors.

Python 3.9+ · haystack >=1.0.0 · tested on 1.14.0
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.