How do apps like PayPal or your bank know a transaction is suspicious before it’s even completed? The answer lies in a clever use of Machine Learning, system design, and a bit of math magic.
What is Fraud Detection?
Fraud detection is the process of identifying illegal or suspicious financial activity—like someone stealing your credit card and trying to buy a ₹1,00,000 laptop from somewhere you've never been.
Real-time fraud detection means catching that activity while it's happening, not hours or days later.
In this article, we’ll break down everything you need to ace this question — covering intuition, theory, architecture, data, modeling, deployment, evaluation, and trade-offs. This is a deep dive tailored for interviews, not just a tutorial.
Table of Contents
Problem Definition
System Requirements & Constraints
High-Level System Design
Data Pipeline & Storage
Feature Engineering
Model Selection & Training
Serving & Inference
Evaluation Metrics
Monitoring & Feedback Loops
Trade-Offs & Interview Tips
1. Problem Definition
Design a real-time machine learning system to detect fraudulent financial transactions.
Clarify:
Fraudulent = Financial transaction made without legitimate user authorization.
Real-time = Decision needed within milliseconds (say 100–300ms).
Scale = Millions of transactions per day (~100s per second).
Label delay = Ground-truth fraud labels might arrive days later.
Skewed data = Fraud is <0.5% of transactions.
2. System Requirements & Constraints
Functional Requirements
Predict if a transaction is fraudulent in real time
Send fraud alerts if prediction score exceeds threshold
Allow human reviewers to inspect borderline cases
Non-Functional Requirements
Latency: ≤ 200ms per prediction
Throughput: Thousands of transactions/sec
Evaluation Metric: Minimize false positives, maximize recall
Scalability: Handle traffic spikes (e.g., Black Friday)
Assumptions to clarify:
Is this a B2C system (e.g., consumer banking app)?
Do we block fraudulent transactions, or just flag?
Is historical data available? What are the features?
3. High-Level System Design
Here’s a simplified architecture:
[Transaction]
↓
[Real-Time Feature Extractor]
↓
[Model Inference API]
↓
[Decision Engine (Threshold / Rule)]
↓
[Allow / Flag / Block]
Key Components:
Streaming Source (Kafka, Pub/Sub)
Feature Store + Real-Time Features
ML Inference Server (FastAPI, TorchServe, BentoML)
Decision Engine (Threshold, Rules + Model)
Feedback Collector (for retraining)
4. Data Collection
Types of data:
Transaction Data: timestamp, amount, merchant, device, IP.
User History: avg spend, transaction frequency.
Location & Device: last known IP, device fingerprint.
Merchant Info: risk category, average chargeback rate.
Label: is_fraud (0/1), comes from chargebacks or manual review.
Code Example (Loading Mock Data):
import pandas as pd
df = pd.read_csv("transactions.csv")
print(df.columns)
# ['transaction_id', 'user_id', 'amount', 'timestamp', 'location',
# 'device', 'merchant_id', 'is_fraud']
5. 🧠 Feature Engineering
Useful features to mention:
Amount deviation: |txn_amount - user_avg|
Velocity: Number of transactions in the past 5 mins
Location anomaly: Geo distance from last txn
Device anomaly: Is this a new device?
Time anomaly: Is txn at an unusual hour?
Code Example:
import numpy as np
# Example: Create amount deviation feature
df['user_avg'] = df.groupby('user_id')['amount'].transform('mean')
df['amount_deviation'] = np.abs(df['amount'] - df['user_avg'])
# Example: Hour of day
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['hour'] = df['timestamp'].dt.hour
6. Data Preprocessing
Key steps:
Handle missing values.
Encode categorical features (Label Encoding or One-Hot).
Normalize numerical features (for models like LR).
Address class imbalance (fraud is rare).
Code Example:
from sklearn.preprocessing import LabelEncoder, StandardScaler
# Encode device and location
df['device'] = LabelEncoder().fit_transform(df['device'])
df['location'] = LabelEncoder().fit_transform(df['location'])
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[['amount', 'amount_deviation']])
7. Model Selection
Models to propose:
Logistic Regression: simple, interpretable
Random Forest / XGBoost: good performance for tabular data
LightGBM: very fast and accurate
Online Learning (Vowpal Wabbit): real-time updates
Model Training Code:
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Prepare data
features = ['amount', 'amount_deviation', 'device', 'hour']
X = df[features]
y = df['is_fraud']
# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2)
# Train model
model = XGBClassifier(scale_pos_weight=100) # handle imbalance
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
8. Real-Time Serving
Serving Stack Options:
API server: FastAPI, Flask, or BentoML
Feature lookup: Redis, Feast
Stream processor: Kafka, Flink, Spark Streaming
Inference API Code (FastAPI):
from fastapi import FastAPI
import joblib
import pandas as pd
app = FastAPI()
model = joblib.load("fraud_model.pkl")
@app.post("/predict")
def predict_fraud(transaction: dict):
features = pd.DataFrame([transaction])
prediction = model.predict(features)[0]
return {"fraud": bool(prediction)}
9. Evaluation Metrics
Focus on these during interviews:
Precision: How many flagged transactions were actually fraud?
Recall: How many frauds were detected?
F1 Score: Harmonic mean of precision and recall.
AUC-ROC: Overall discrimination ability.
Confusion Matrix: TP, FP, FN, TN — especially important here.
Evaluation Code:
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
10. Monitoring & Retraining
What to Monitor:
Prediction volume and fraud rate
Data drift in key features
Model confidence distribution
Latencies (P50, P95, P99)
Feedback Loop
Human reviews → labeled frauds → model retraining
Daily/weekly retraining
Use delayed ground truth carefully (label delay)
Interview Tips
Things to emphasize:
Real-time constraints: latency, freshness, scale
Feature engineering choices and why
Handling imbalance and label delay
Trade-offs: interpretability vs accuracy, batch vs stream
How the model is monitored and updated
Conclusion
Building a robust real-time fraud detection system is a challenging but incredibly rewarding endeavor. It requires a deep understanding of machine learning principles, data engineering practices, and the specific nuances of the fraud landscape. By mastering the concepts outlined in this article, you'll not only be better equipped to build such systems but also stand out in your next Machine Learning interview.
Support My Work
If you found this post valuable, consider supporting my work with a small contribution via UPI. Your support:
Helps me and my team spend more time simplifying complex ML topics
Keeps deep-dive content accessible to everyone
Fuels more research and better-quality posts
UPI ID: manishmazumder-1@okaxis (scan below QR!)
Thank you for being part of this learning journey!
Hi, I found this article very interesting—thank you for conveying this practical guide so clearly. I have one question regarding Part 2. You mentioned the evaluation metric to be recall, as you are aiming to decrease false positives. However, reducing false positives typically leads to an increase in precision, not recall. Please correct me if I’m wrong.
Excellent article that shows the way to achieve this. Could you perhaps elaborate on the code with concrete examples? Thanks!