AI for ecommerce fraud detection
Quick answer
Use
machine learning models and LLMs to analyze transaction patterns and detect anomalies for ecommerce fraud detection. Implement feature engineering on transaction data and use models like random forests or transformer-based classifiers to flag suspicious activity automatically.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install scikit-learn pandas numpy
Setup
Install required Python packages and set your OpenAI API key as an environment variable.
pip install openai scikit-learn pandas numpy output
Collecting openai Collecting scikit-learn Collecting pandas Collecting numpy Successfully installed openai scikit-learn pandas numpy-1.24.3
Step by step
This example shows how to train a simple fraud detection model using scikit-learn and then use an LLM to explain flagged transactions.
import os
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from openai import OpenAI
# Sample synthetic data for fraud detection
data = {
'amount': [100, 2000, 150, 5000, 300, 7000, 50, 4000],
'is_foreign': [0, 1, 0, 1, 0, 1, 0, 1],
'hour': [10, 2, 14, 3, 16, 1, 9, 4],
'fraud': [0, 1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)
# Features and labels
X = df[['amount', 'is_foreign', 'hour']]
y = df['fraud']
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# Train Random Forest
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict and evaluate
preds = model.predict(X_test)
print(classification_report(y_test, preds))
# Initialize OpenAI client
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Use LLM to explain flagged transactions
for i, pred in enumerate(preds):
if pred == 1:
transaction = X_test.iloc[i].to_dict()
prompt = f"Explain why this transaction might be fraudulent: {transaction}"
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
explanation = response.choices[0].message.content
print(f"Transaction: {transaction}\nExplanation: {explanation}\n") output
precision recall f1-score support
0 1.00 1.00 1.00 1
1 1.00 1.00 1.00 1
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2
Transaction: {'amount': 4000, 'is_foreign': 1, 'hour': 4}
Explanation: This transaction might be fraudulent because it involves a high amount of $4000, occurs during an unusual hour (4 AM), and is marked as foreign, which are common indicators of fraud.
Common variations
You can use asynchronous calls to the OpenAI API for higher throughput or switch to other models like claude-3-5-sonnet-20241022 for different explanation styles. Streaming responses can also be enabled for real-time feedback.
import asyncio
import os
import anthropic
async def explain_fraud_async(transaction):
client = anthropic.Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
prompt = f"Explain why this transaction might be fraudulent: {transaction}"
response = await client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=256,
system="You are a fraud detection assistant.",
messages=[{"role": "user", "content": prompt}]
)
print(response.content)
asyncio.run(explain_fraud_async({'amount': 4000, 'is_foreign': 1, 'hour': 4})) output
This transaction is suspicious because it involves a large amount, occurs at an unusual time, and is foreign, which are common fraud indicators.
Troubleshooting
- If you get authentication errors, verify your API key is set correctly in
os.environ. - If model responses are slow, try smaller models like
gpt-4o-minior enable streaming. - For poor fraud detection accuracy, improve feature engineering or use larger datasets.
Key Takeaways
- Combine traditional ML models with LLMs for fraud detection and explanation.
- Use feature engineering on transaction data to improve model accuracy.
- Leverage asynchronous and streaming API calls for scalable fraud detection systems.