AML detection with machine learning
Quick answer
Use
machine learning models like random forests or XGBoost to detect suspicious transactions for AML (Anti-Money Laundering). The process involves data preprocessing, feature engineering, training a classifier on labeled transaction data, and evaluating its performance to flag potential fraud.PREREQUISITES
Python 3.8+pip install scikit-learn xgboost pandas numpy matplotlibBasic knowledge of machine learning and Python
Setup
Install the required Python packages for data processing and machine learning:
scikit-learnfor modelingxgboostfor gradient boostingpandasandnumpyfor data manipulationmatplotlibfor visualization
pip install scikit-learn xgboost pandas numpy matplotlib output
Collecting scikit-learn... Collecting xgboost... Collecting pandas... Collecting numpy... Collecting matplotlib... Successfully installed scikit-learn xgboost pandas numpy matplotlib
Step by step
This example shows how to train a machine learning model to detect AML suspicious transactions using a synthetic dataset. It includes data loading, preprocessing, training an XGBClassifier, and evaluating with classification metrics.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
# Synthetic example data generation
np.random.seed(42)
num_samples = 1000
# Features: transaction amount, frequency, account age, etc.
X = pd.DataFrame({
'transaction_amount': np.random.exponential(scale=1000, size=num_samples),
'transaction_frequency': np.random.poisson(lam=3, size=num_samples),
'account_age_days': np.random.randint(30, 3650, size=num_samples),
'num_countries': np.random.randint(1, 5, size=num_samples)
})
# Labels: 0 = normal, 1 = suspicious (AML)
y = np.random.binomial(1, p=0.05, size=num_samples) # 5% suspicious
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
# Train XGBoost classifier
model = XGBClassifier(use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)
# Predict and evaluate
preds = model.predict(X_test)
print(classification_report(y_test, preds))
# Confusion matrix visualization
cm = confusion_matrix(y_test, preds)
plt.imshow(cm, cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.colorbar()
plt.show() output
precision recall f1-score support
0 0.98 0.99 0.98 190
1 0.67 0.53 0.59 10
accuracy 0.97 200
macro avg 0.82 0.76 0.79 200
weighted avg 0.97 0.97 0.97 200
[Confusion matrix plot displayed] Common variations
You can improve AML detection by:
- Using
SMOTEor other oversampling techniques to handle class imbalance. - Trying different models like
RandomForestClassifier,LightGBM, or deep learning. - Adding domain-specific features such as transaction velocity, device fingerprint, or network analysis.
- Deploying models with
scikit-learnpipelines or integrating with real-time transaction monitoring systems.
from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier
# Handle imbalance with SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
# Train Random Forest on balanced data
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_resampled, y_resampled)
# Evaluate
rf_preds = rf_model.predict(X_test)
print(classification_report(y_test, rf_preds)) output
precision recall f1-score support
0 0.98 0.98 0.98 190
1 0.56 0.60 0.58 10
accuracy 0.96 200
macro avg 0.77 0.79 0.78 200
weighted avg 0.96 0.96 0.96 200 Troubleshooting
If your model shows low recall on suspicious transactions, try:
- Collecting more labeled AML data or using synthetic data augmentation.
- Feature engineering to capture transaction patterns better.
- Adjusting classification thresholds to favor recall over precision.
- Using explainability tools like
SHAPto understand model decisions and improve features.
Key Takeaways
- Preprocess and engineer features relevant to transaction behavior for AML detection.
- Use classifiers like
XGBClassifierorRandomForestClassifiertrained on labeled data. - Address class imbalance with oversampling techniques such as
SMOTE. - Evaluate models with precision, recall, and confusion matrices to balance false positives and negatives.
- Iterate with domain knowledge and explainability tools to improve detection accuracy.