How to intermediate · 3 min read

AI for ecommerce fraud detection

Q: AI for ecommerce fraud detection

Use machine learning models and LLMs to analyze transaction patterns and detect anomalies for ecommerce fraud detection. Implement feature engineering on transaction data and use models like random forests or transformer-based classifiers to flag suspicious activity automatically.

Quick answer

Use machine learning models and LLMs to analyze transaction patterns and detect anomalies for ecommerce fraud detection. Implement feature engineering on transaction data and use models like random forests or transformer-based classifiers to flag suspicious activity automatically.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install scikit-learn pandas numpy

Setup

Install required Python packages and set your OpenAI API key as an environment variable.

bash

pip install openai scikit-learn pandas numpy

output

Collecting openai
Collecting scikit-learn
Collecting pandas
Collecting numpy
Successfully installed openai scikit-learn pandas numpy-1.24.3

Step by step

This example shows how to train a simple fraud detection model using scikit-learn and then use an LLM to explain flagged transactions.

python

import os
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from openai import OpenAI

# Sample synthetic data for fraud detection
data = {
    'amount': [100, 2000, 150, 5000, 300, 7000, 50, 4000],
    'is_foreign': [0, 1, 0, 1, 0, 1, 0, 1],
    'hour': [10, 2, 14, 3, 16, 1, 9, 4],
    'fraud': [0, 1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

# Features and labels
X = df[['amount', 'is_foreign', 'hour']]
y = df['fraud']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Train Random Forest
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
preds = model.predict(X_test)
print(classification_report(y_test, preds))

# Initialize OpenAI client
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Use LLM to explain flagged transactions
for i, pred in enumerate(preds):
    if pred == 1:
        transaction = X_test.iloc[i].to_dict()
        prompt = f"Explain why this transaction might be fraudulent: {transaction}" 
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        explanation = response.choices[0].message.content
        print(f"Transaction: {transaction}\nExplanation: {explanation}\n")

output

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

Transaction: {'amount': 4000, 'is_foreign': 1, 'hour': 4}
Explanation: This transaction might be fraudulent because it involves a high amount of $4000, occurs during an unusual hour (4 AM), and is marked as foreign, which are common indicators of fraud.

Common variations

You can use asynchronous calls to the OpenAI API for higher throughput or switch to other models like claude-3-5-sonnet-20241022 for different explanation styles. Streaming responses can also be enabled for real-time feedback.

python

import asyncio
import os
import anthropic

async def explain_fraud_async(transaction):
    client = anthropic.Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
    prompt = f"Explain why this transaction might be fraudulent: {transaction}"
    response = await client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=256,
        system="You are a fraud detection assistant.",
        messages=[{"role": "user", "content": prompt}]
    )
    print(response.content)

asyncio.run(explain_fraud_async({'amount': 4000, 'is_foreign': 1, 'hour': 4}))

output

This transaction is suspicious because it involves a large amount, occurs at an unusual time, and is foreign, which are common fraud indicators.

Troubleshooting

If you get authentication errors, verify your API key is set correctly in os.environ.
If model responses are slow, try smaller models like gpt-4o-mini or enable streaming.
For poor fraud detection accuracy, improve feature engineering or use larger datasets.

✅

Key Takeaways

Combine traditional ML models with LLMs for fraud detection and explanation.
Use feature engineering on transaction data to improve model accuracy.
Leverage asynchronous and streaming API calls for scalable fraud detection systems.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗