Real-Time Fraud Detection — ML System Design Interviews Part - 01

Jun 23, 2025

How do apps like PayPal or your bank know a transaction is suspicious before it’s even completed? The answer lies in a clever use of Machine Learning, system design, and a bit of math magic.

What is Fraud Detection?

Fraud detection is the process of identifying illegal or suspicious financial activity—like someone stealing your credit card and trying to buy a ₹1,00,000 laptop from somewhere you've never been.

Real-time fraud detection means catching that activity while it's happening, not hours or days later.

In this article, we’ll break down everything you need to ace this question — covering intuition, theory, architecture, data, modeling, deployment, evaluation, and trade-offs. This is a deep dive tailored for interviews, not just a tutorial.

Problem Definition
System Requirements & Constraints
High-Level System Design
Data Pipeline & Storage
Feature Engineering
Model Selection & Training
Serving & Inference
Evaluation Metrics
Monitoring & Feedback Loops
Trade-Offs & Interview Tips

1. Problem Definition

Design a real-time machine learning system to detect fraudulent financial transactions.

Clarify:

Fraudulent = Financial transaction made without legitimate user authorization.
Real-time = Decision needed within milliseconds (say 100–300ms).
Scale = Millions of transactions per day (~100s per second).
Label delay = Ground-truth fraud labels might arrive days later.
Skewed data = Fraud is <0.5% of transactions.

2. System Requirements & Constraints

Functional Requirements

Predict if a transaction is fraudulent in real time
Send fraud alerts if prediction score exceeds threshold
Allow human reviewers to inspect borderline cases

Non-Functional Requirements

Latency: ≤ 200ms per prediction
Throughput: Thousands of transactions/sec
Evaluation Metric: Minimize false positives, maximize recall
Scalability: Handle traffic spikes (e.g., Black Friday)

Assumptions to clarify:

Is this a B2C system (e.g., consumer banking app)?
Do we block fraudulent transactions, or just flag?
Is historical data available? What are the features?
Thanks for reading! Subscribe for free to receive new posts and support my work.

3. High-Level System Design

Here’s a simplified architecture:

[Transaction] 
   ↓
[Real-Time Feature Extractor]
   ↓
[Model Inference API] 
   ↓
[Decision Engine (Threshold / Rule)] 
   ↓
[Allow / Flag / Block]

Key Components:

Streaming Source (Kafka, Pub/Sub)
Feature Store + Real-Time Features
ML Inference Server (FastAPI, TorchServe, BentoML)
Decision Engine (Threshold, Rules + Model)
Feedback Collector (for retraining)

4. Data Collection

Types of data:

Transaction Data: timestamp, amount, merchant, device, IP.
User History: avg spend, transaction frequency.
Location & Device: last known IP, device fingerprint.
Merchant Info: risk category, average chargeback rate.
Label: is_fraud (0/1), comes from chargebacks or manual review.

Code Example (Loading Mock Data):

import pandas as pd

df = pd.read_csv("transactions.csv")

print(df.columns)
# ['transaction_id', 'user_id', 'amount', 'timestamp', 'location', 
#  'device', 'merchant_id', 'is_fraud']

5. 🧠 Feature Engineering

Useful features to mention:

Amount deviation: |txn_amount - user_avg|
Velocity: Number of transactions in the past 5 mins
Location anomaly: Geo distance from last txn
Device anomaly: Is this a new device?
Time anomaly: Is txn at an unusual hour?

Code Example:

import numpy as np

# Example: Create amount deviation feature
df['user_avg'] = df.groupby('user_id')['amount'].transform('mean')
df['amount_deviation'] = np.abs(df['amount'] - df['user_avg'])

# Example: Hour of day
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['hour'] = df['timestamp'].dt.hour

6. Data Preprocessing

Key steps:

Handle missing values.
Encode categorical features (Label Encoding or One-Hot).
Normalize numerical features (for models like LR).
Address class imbalance (fraud is rare).

Code Example:

from sklearn.preprocessing import LabelEncoder, StandardScaler

# Encode device and location
df['device'] = LabelEncoder().fit_transform(df['device'])
df['location'] = LabelEncoder().fit_transform(df['location'])

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[['amount', 'amount_deviation']])

7. Model Selection

Models to propose:

Logistic Regression: simple, interpretable
Random Forest / XGBoost: good performance for tabular data
LightGBM: very fast and accurate
Online Learning (Vowpal Wabbit): real-time updates

Model Training Code:

from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Prepare data
features = ['amount', 'amount_deviation', 'device', 'hour']
X = df[features]
y = df['is_fraud']

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2)

# Train model
model = XGBClassifier(scale_pos_weight=100)  # handle imbalance
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

8. Real-Time Serving

Serving Stack Options:

API server: FastAPI, Flask, or BentoML
Feature lookup: Redis, Feast
Stream processor: Kafka, Flink, Spark Streaming

Inference API Code (FastAPI):

from fastapi import FastAPI
import joblib
import pandas as pd

app = FastAPI()
model = joblib.load("fraud_model.pkl")

@app.post("/predict")
def predict_fraud(transaction: dict):
    features = pd.DataFrame([transaction])
    prediction = model.predict(features)[0]
    return {"fraud": bool(prediction)}

9. Evaluation Metrics

Focus on these during interviews:

Precision: How many flagged transactions were actually fraud?
Recall: How many frauds were detected?
F1 Score: Harmonic mean of precision and recall.
AUC-ROC: Overall discrimination ability.
Confusion Matrix: TP, FP, FN, TN — especially important here.

Evaluation Code:

from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

10. Monitoring & Retraining

What to Monitor:

Prediction volume and fraud rate
Data drift in key features
Model confidence distribution
Latencies (P50, P95, P99)

Feedback Loop

Human reviews → labeled frauds → model retraining
Daily/weekly retraining
Use delayed ground truth carefully (label delay)

Interview Tips

Things to emphasize:

Real-time constraints: latency, freshness, scale
Feature engineering choices and why
Handling imbalance and label delay
Trade-offs: interpretability vs accuracy, batch vs stream
How the model is monitored and updated

Conclusion

Building a robust real-time fraud detection system is a challenging but incredibly rewarding endeavor. It requires a deep understanding of machine learning principles, data engineering practices, and the specific nuances of the fraud landscape. By mastering the concepts outlined in this article, you'll not only be better equipped to build such systems but also stand out in your next Machine Learning interview.

Support My Work

If you found this post valuable, consider supporting my work with a small contribution via UPI. Your support:

Helps me and my team spend more time simplifying complex ML topics
Keeps deep-dive content accessible to everyone
Fuels more research and better-quality posts

UPI ID: manishmazumder-1@okaxis (scan below QR!)

Thank you for being part of this learning journey!

Vinay Singh

Jun 29

Hi, I found this article very interesting—thank you for conveying this practical guide so clearly. I have one question regarding Part 2. You mentioned the evaluation metric to be recall, as you are aiming to decrease false positives. However, reducing false positives typically leads to an increase in precision, not recall. Please correct me if I’m wrong.

Expand full comment

1 reply

Enrique Corchero

Jun 27

Excellent article that shows the way to achieve this. Could you perhaps elaborate on the code with concrete examples? Thanks!

4 more comments...

Manish Mazumder

Discussion about this post

Manish Mazumder

Real-Time Fraud Detection — ML System Design Interviews Part - 01

What is Fraud Detection?

Table of Contents

1. Problem Definition

2. System Requirements & Constraints

Functional Requirements

Non-Functional Requirements

Assumptions to clarify:

3. High-Level System Design

4. Data Collection

Types of data:

Code Example (Loading Mock Data):

5. 🧠 Feature Engineering

Useful features to mention:

Code Example:

6. Data Preprocessing

Key steps:

Code Example:

7. Model Selection

Models to propose:

Model Training Code:

8. Real-Time Serving

Serving Stack Options:

Inference API Code (FastAPI):

9. Evaluation Metrics

Focus on these during interviews:

Evaluation Code:

10. Monitoring & Retraining

What to Monitor:

Feedback Loop

Interview Tips

Things to emphasize:

Conclusion

Support My Work

Discussion about this post