How to Intermediate · 4 min read

AML detection with machine learning

Q: AML detection with machine learning

Use machine learning models like random forests or XGBoost to detect suspicious transactions for AML (Anti-Money Laundering). The process involves data preprocessing, feature engineering, training a classifier on labeled transaction data, and evaluating its performance to flag potential fraud.

Quick answer

Use machine learning models like random forests or XGBoost to detect suspicious transactions for AML (Anti-Money Laundering). The process involves data preprocessing, feature engineering, training a classifier on labeled transaction data, and evaluating its performance to flag potential fraud.

PREREQUISITES

Python 3.8+
pip install scikit-learn xgboost pandas numpy matplotlib
Basic knowledge of machine learning and Python

Setup

Install the required Python packages for data processing and machine learning:

scikit-learn for modeling
xgboost for gradient boosting
pandas and numpy for data manipulation
matplotlib for visualization

bash

pip install scikit-learn xgboost pandas numpy matplotlib

output

Collecting scikit-learn...
Collecting xgboost...
Collecting pandas...
Collecting numpy...
Collecting matplotlib...
Successfully installed scikit-learn xgboost pandas numpy matplotlib

Step by step

This example shows how to train a machine learning model to detect AML suspicious transactions using a synthetic dataset. It includes data loading, preprocessing, training an XGBClassifier, and evaluating with classification metrics.

python

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from xgboost import XGBClassifier
import matplotlib.pyplot as plt

# Synthetic example data generation
np.random.seed(42)
num_samples = 1000

# Features: transaction amount, frequency, account age, etc.
X = pd.DataFrame({
    'transaction_amount': np.random.exponential(scale=1000, size=num_samples),
    'transaction_frequency': np.random.poisson(lam=3, size=num_samples),
    'account_age_days': np.random.randint(30, 3650, size=num_samples),
    'num_countries': np.random.randint(1, 5, size=num_samples)
})

# Labels: 0 = normal, 1 = suspicious (AML)
y = np.random.binomial(1, p=0.05, size=num_samples)  # 5% suspicious

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

# Train XGBoost classifier
model = XGBClassifier(use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Predict and evaluate
preds = model.predict(X_test)
print(classification_report(y_test, preds))

# Confusion matrix visualization
cm = confusion_matrix(y_test, preds)
plt.imshow(cm, cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.colorbar()
plt.show()

output

              precision    recall  f1-score   support

           0       0.98      0.99      0.98       190
           1       0.67      0.53      0.59        10

    accuracy                           0.97       200
   macro avg       0.82      0.76      0.79       200
weighted avg       0.97      0.97      0.97       200

[Confusion matrix plot displayed]

Common variations

You can improve AML detection by:

Using SMOTE or other oversampling techniques to handle class imbalance.
Trying different models like RandomForestClassifier, LightGBM, or deep learning.
Adding domain-specific features such as transaction velocity, device fingerprint, or network analysis.
Deploying models with scikit-learn pipelines or integrating with real-time transaction monitoring systems.

python

from imblearn.over_sampling import SMOTE
from sklearn.ensemble import RandomForestClassifier

# Handle imbalance with SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Train Random Forest on balanced data
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_resampled, y_resampled)

# Evaluate
rf_preds = rf_model.predict(X_test)
print(classification_report(y_test, rf_preds))

output

              precision    recall  f1-score   support

           0       0.98      0.98      0.98       190
           1       0.56      0.60      0.58        10

    accuracy                           0.96       200
   macro avg       0.77      0.79      0.78       200
weighted avg       0.96      0.96      0.96       200

Troubleshooting

If your model shows low recall on suspicious transactions, try:

Collecting more labeled AML data or using synthetic data augmentation.
Feature engineering to capture transaction patterns better.
Adjusting classification thresholds to favor recall over precision.
Using explainability tools like SHAP to understand model decisions and improve features.

✅

Key Takeaways

Preprocess and engineer features relevant to transaction behavior for AML detection.
Use classifiers like XGBClassifier or RandomForestClassifier trained on labeled data.
Address class imbalance with oversampling techniques such as SMOTE.
Evaluate models with precision, recall, and confusion matrices to balance false positives and negatives.
Iterate with domain knowledge and explainability tools to improve detection accuracy.

Verified 2026-04 · XGBClassifier, RandomForestClassifier, SMOTE

Verify ↗