How to Intermediate · 4 min read

AML detection with machine learning

Quick answer
Use machine learning models like random forests or XGBoost to detect suspicious transactions for AML (Anti-Money Laundering). The process involves data preprocessing, feature engineering, training a classifier on labeled transaction data, and evaluating its performance to flag potential fraud.

PREREQUISITES

  • Python 3.8+
  • pip install scikit-learn xgboost pandas numpy matplotlib
  • Basic knowledge of machine learning and Python

Setup

Install the required Python packages for data processing and machine learning:

  • scikit-learn for modeling
  • xgboost for gradient boosting
  • pandas and numpy for data manipulation
  • matplotlib for visualization
bash
pip install scikit-learn xgboost pandas numpy matplotlib
output
Collecting scikit-learn...
Collecting xgboost...
Collecting pandas...
Collecting numpy...
Collecting matplotlib...
Successfully installed scikit-learn xgboost pandas numpy matplotlib

Step by step

This example shows how to train a machine learning model to detect AML suspicious transactions using a synthetic dataset. It includes data loading, preprocessing, training an XGBClassifier, and evaluating with classification metrics.

python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from xgboost import XGBClassifier
import matplotlib.pyplot as plt

# Synthetic example data generation
np.random.seed(42)
num_samples = 1000

# Features: transaction amount, frequency, account age, etc.
X = pd.DataFrame({
    'transaction_amount': np.random.exponential(scale=1000, size=num_samples),
    'transaction_frequency': np.random.poisson(lam=3, size=num_samples),
    'account_age_days': np.random.randint(30, 3650, size=num_samples),
    'num_countries': np.random.randint(1, 5, size=num_samples)
})

# Labels: 0 = normal, 1 = suspicious (AML)
y = np.random.binomial(1, p=0.05, size=num_samples)  # 5% suspicious

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

# Train XGBoost classifier
model = XGBClassifier(use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Predict and evaluate
preds = model.predict(X_test)
print(classification_report(y_test, preds))

# Confusion matrix visualization
cm = confusion_matrix(y_test, preds)
plt.imshow(cm, cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.colorbar()
plt.show()
output
              precision    recall  f1-score   support

           0       0.98      0.99      0.98       190
           1       0.67      0.53      0.59        10

    accuracy                           0.97       200
   macro avg       0.82      0.76      0.79       200
weighted avg       0.97      0.97      0.97       200

[Confusion matrix plot displayed]

Common variations

You can improve AML detection by:

  • Using SMOTE or other oversampling techniques to handle class imbalance.
  • Trying different models like RandomForestClassifier, LightGBM, or deep learning.
  • Adding domain-specific features such as transaction velocity, device fingerprint, or network analysis.
  • Deploying models with scikit-learn pipelines or integrating with real-time transaction monitoring systems.
python
from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier

# Handle imbalance with SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Train Random Forest on balanced data
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_resampled, y_resampled)

# Evaluate
rf_preds = rf_model.predict(X_test)
print(classification_report(y_test, rf_preds))
output
              precision    recall  f1-score   support

           0       0.98      0.98      0.98       190
           1       0.56      0.60      0.58        10

    accuracy                           0.96       200
   macro avg       0.77      0.79      0.78       200
weighted avg       0.96      0.96      0.96       200

Troubleshooting

If your model shows low recall on suspicious transactions, try:

  • Collecting more labeled AML data or using synthetic data augmentation.
  • Feature engineering to capture transaction patterns better.
  • Adjusting classification thresholds to favor recall over precision.
  • Using explainability tools like SHAP to understand model decisions and improve features.

Key Takeaways

  • Preprocess and engineer features relevant to transaction behavior for AML detection.
  • Use classifiers like XGBClassifier or RandomForestClassifier trained on labeled data.
  • Address class imbalance with oversampling techniques such as SMOTE.
  • Evaluate models with precision, recall, and confusion matrices to balance false positives and negatives.
  • Iterate with domain knowledge and explainability tools to improve detection accuracy.
Verified 2026-04 · XGBClassifier, RandomForestClassifier, SMOTE
Verify ↗