ML Fraud Detection for Ewallet: Model Design & Deployment

A digital wallet processes hundreds of transactions per second. Most are legitimate. A small fraction — often less than 1% — are not. The challenge is telling them apart in under 300 milliseconds, without blocking the legitimate ones.

This is the core problem Sun* solved for G-Pay, a Singapore-based eWallet handling 1 million transactions daily. The system needed to catch fraud in real time, hold up under peak load, and maintain 99.99% fraud detection accuracy — all without introducing friction for legitimate users.

In this article, we will share a practical perspective on how modern eWallet platforms approach fraud detection with machine learning.

Below, we will walk you through: the architecture decisions that matter, how real-time scoring actually works, and what it takes to keep a deployed model effective over time.

Key takeaways

Effective fraud detection is as much a distributed systems problem as an ML problem, requiring low-latency architecture, scalable serving, and reliable real-time pipelines.
Strong feature engineering — combining transaction, behavioral, graph, and temporal signals — often improves fraud detection more than increasing model complexity alone.
Modern fraud systems balance security and user experience through real-time risk scoring, adaptive thresholds, and tiered responses such as MFA or automated blocking.
Fraud models must continuously monitor, retrain, and adapt to concept drift, because static models quickly become ineffective against evolving attack patterns.

Why Traditional Fraud Detection in eWallet Apps is a Hard ML Problem?

Detecting fraud in eWallet and mobile money ecosystems is fundamentally different from many other machine learning applications. It sits at the intersection of extreme data imbalance, real-time system constraints, and adversarial behavior, making it one of the most operationally demanding use cases for applied ML.

Extreme Class Imbalance and the “Accuracy Paradox”

The primary challenge is extreme class imbalance (“needle in a haystack”), as fraud is under 1% of eWallet transactions. This leads to the accuracy paradox: a model can be >99% accurate by predicting all transactions as legitimate, failing its goal. To overcome the bias toward the majority class, strategies like specialist training techniques that force the model to pay closer attention to the rare fraud cases and specialized fraud-focused evaluation metrics are essential.

Strict Real-Time Latency Budgets

Unlike systems using batch processing, eWallet transactions require near-instant clearance. Fraud systems must complete a complex pipeline (feature retrieval, pattern evaluation, model inference, decision) within tight, often sub-hundred-millisecond budgets. This tension exists between using sophisticated models for better detection and optimizing them carefully to ensure fast system performance and good user experience.

Blind Spots to Collusive “Mule” Networks – interconnected accounts used to move and hide stolen funds

Many rule-based systems and tabular machine learning models evaluate transactions in isolation, focusing on individual accounts or events. However, modern fraud increasingly operates through organized “mule networks” – interconnected accounts used to distribute and obscure illicit funds. Without a relational or graph-based view of the ecosystem, these systems miss the underlying connections between entities, allowing coordinated fraud patterns to remain undetected.

Concept Drift and Adversarial Adaptation

Unlike static prediction problems, fraud is an evolving target. As soon as a detection mechanism becomes effective, attackers adjust their tactics-shifting to new methods such as account takeovers, SIM swaps, or social engineering schemes. This continuous evolution means that historical training data quickly becomes outdated. Models that are not regularly updated or designed to adapt in near real time will see their performance degrade, often silently, over time.

The High-Stakes Trade-off Between Security and Friction

False negatives – missed fraud – result in direct financial losses and potential regulatory exposure. False positives-incorrectly blocking legitimate users can be equally damaging, leading to poor user experience, reduced trust, and increased operational costs due to manual reviews. Striking the right balance between precision and recall is not just a modeling challenge; it is a business-critical decision that directly impacts growth, retention, and risk exposure.

How ML changes financial fraud detection

Machine learning is transforming fraud detection by replacing static, manually updated rules with systems that continuously learn and adapt from data. Instead of relying on predefined logic, ML models identify patterns in context and evolve alongside emerging fraud tactics.

Tree-based models remain widely used for their speed and accuracy, while graph-based methods help uncover coordinated fraud networks through relationships between accounts, devices, and transactions. Modern ML systems also leverage real-time data pipelines and feature stores to evaluate transactions within milliseconds, balancing security with seamless eWallet experiences.

To address highly imbalanced fraud data, institutions apply specialized training techniques that improve detection of rare fraud cases. At the same time, growing model complexity is driving investment in explainability and privacy-preserving methods to ensure transparency and secure collaboration.

System Architecture for ML Fraud Detection at Scale

Before considering any specific model, a well-designed ML fraud system requires clarity on its end-to-end architecture. The most sophisticated model in the world is useless if it cannot be served reliably at sub-200ms latency, if its features drift from training to serving, or if there is no feedback loop to retrain it on evolving fraud patterns.

The end-to-end flow

An effective machine learning fraud framework operates as an integrated chain of specialized modules rather than a solitary model. These components work in unison to convert an initial transaction trigger into an actionable, real-time assessment of risk.

System Architecture for ML Fraud Detection at Scale

Offline training vs. real-time serving

One of the most important design decisions in an ML fraud system is the clean separation between offline model training and real-time model serving. Training happens on large historical datasets, typically in batch, using the full richness of available features including those that would be too expensive to compute in real time. Serving must be optimized entirely for latency and reliability. Coupling these two concerns is a common source of system fragility.

Fraud detection is a distributed systems engineering problem as much as it is an ML problem. Teams that treat it purely as a modeling exercise consistently underperform teams that treat model quality and system reliability as equally important constraints.

Feature Engineering for Payment & Behavioral Data

Raw transaction data alone is not enough for effective ML fraud detection for eWallet systems. The goal of feature engineering in ML fraud detection for eWallet platforms is to add context – helping models understand not just what happened, but whether it is unusual for a given user, connected to broader patterns, or suspicious in timing.

In practice, this is structured across four core dimensions: transaction signals, behavioral baselines, network relationships, and time. Transaction features capture immediate context (e.g., velocity, device, merchant), while behavioral features define what “normal” looks like for each user. Graph features extend this further by exposing hidden connections across accounts, and temporal features ensure recent and sequential activity is properly weighted.

All of this is supported by a feature store, which ensures features are consistently computed and served in real time without breaking latency constraints.

Feature Dimension	Focus	Examples	Value
Transaction Signals	Payment-level context	Amount ratio, velocity, device	Detects abnormal transactions
Behavioral Baselines	User-specific patterns	Spend average, geo consistency	Flags deviations from normal behavior
Graph / Network	Cross-entity relationships	Shared devices, IP links	Identifies fraud rings
Temporal Features	Time & sequence	Sliding windows, recent activity	Captures short-term anomalies
Feature Store	Serving infrastructure	Real-time + batch features	Enables low-latency, consistent scoring

Strong feature engineering is ultimately about capturing meaningful context efficiently, which is often more impactful than increasing model complexity.

Real-Time Scoring Pipeline & Threshold Optimization

The real-time scoring pipeline must operate within a strict service-level objective – commonly requiring a complete fraud decision in under 300 milliseconds. The sequence begins the moment a transaction arrives at the gateway, triggering a compact risk request.

A streaming architecture, using tools like Kafka, pulls the incoming transaction features while simultaneously fetching precomputed historical context, such as a user’s 30-day transaction average from an in-memory feature store like Redis. This enriched data is passed to the ML model for inference, which produces a continuous risk score that a calibration layer then maps to an immediate operational action.

Achieving a sub-300ms pipeline requires extensive optimization. End-to-end response times of 42–45 milliseconds were demonstrated using FastAPI/Docker (lightweight, containerized services). The typical latency breakdown is: API request handling (1.2ms), JSON parsing (0.8ms), feature scaling (0.5ms), and XGBoost inference (39.5ms).

Heavy historical aggregations are pre-computed and cached. Further gains result from model optimizations like tree pruning, 16-bit quantization, parallel evaluation, and early-stopping based on prediction confidence.

Due to severe data imbalance, the default 0.5 probability threshold is inadequate for financial fraud detection. Instead, threshold optimization uses metrics that measure how often the model catches real fraud vs. how often it wrongly flags legitimate users, which better assess minority-class performance than standard ROC analysis. This calibration is a cost-benefit trade-off: lowering the threshold increases fraud capture (recall) but also increases false positives, incurring an operational cost ($5-$15 per manual review) and user friction.

Raising it reduces false alarms but lets more fraud pass. The optimal threshold balances expected fraud losses with tolerance for user friction, varying by segment, transaction type, and business context.

Modern fraud systems use a tiered decision framework based on a continuous risk score. Transactions below a low threshold (e.g., probability < 0.1) are instantly approved. Intermediate scores trigger a step-up challenge (like MFA) or human review. High-scoring transactions are automatically blocked. This adaptive authentication scales security friction dynamically, ensuring seamless routine activity while enforcing tighter verification for high-risk cases.

What This Looks Like in Production: G-Pay

To make this concrete, here is what the above architecture looks like when it runs at scale.

G-Pay is a Singapore-based eWallet built to handle 1M transactions per day. The platform needed to meet three simultaneous requirements that define most serious eWallet builds: high transaction throughput with low latency, full PCI DSS compliance, and a fraud detection layer that could keep pace with the volume without generating excessive false positives.

Sun* designed and deployed the ML fraud detection system as part of a broader platform built on Golang microservices, Java, PostgreSQL, and React Native.

Golang was chosen specifically for its concurrency model — it handles the high-volume, parallel nature of transaction processing without the overhead that slower runtimes introduce at this scale.

The fraud detection layer sits within this architecture, receiving transaction signals, pulling pre-computed behavioral context, running model inference, and returning a risk decision — all within the platform’s latency budget.

The result:

99.99% fraud detection accuracy at 1 million daily transactions,
with 99.9% uptime maintained through horizontal scaling designed for peak load periods

A system optimized purely for recall (catching all fraud) would block too many legitimate users. One optimized purely for precision (never wrongly blocking) would miss real fraud. Hitting 99.99% accuracy at this volume means the threshold calibration, feature engineering, and model architecture are working together correctly in production — not just in evaluation.

Production Deployment, Monitoring, and Continuous Learning

Deploying a fraud model is not the end of the ML lifecycle — it is the start of the operational phase, where long-term value is sustained or lost.

Three capabilities determine whether a fraud system remains effective over time: deployment, monitoring, and continuous learning.

To meet strict 300ms SLAs, models are deployed as modular microservices using Docker, Kubernetes, and FastAPI, with optimizations like pruning, quantization, and early stopping reducing inference latency to roughly 42–45ms per transaction.

New models are introduced through champion-challenger testing, shadow mode, and canary releases, allowing validation on live traffic before full deployment.

Production systems require both infrastructure monitoring (latency, uptime, output health) and statistical monitoring (precision-recall, drift detection). Techniques like Kolmogorov-Smirnov and Population Stability Index detect shifts in fraud behavior that can make training data obsolete.

Because fraud patterns constantly evolve, institutions rely on continuous retraining pipelines, often using rolling 90-day datasets and automated retraining triggers through platforms like MLflow.

A fraud model without retraining is a depreciating asset. Over time, static models become increasingly predictable to the adversary.

Sun* built and deployed the ML fraud detection system powering G-Pay — a Singapore eWallet processing 1 million transactions daily, achieving 99.99% fraud detection accuracy while maintaining 99.9% uptime under peak load. If you’re building a fraud detection layer for your eWallet or payment platform — whether from scratch or on top of an existing rules-based system — we can help you assess your current architecture and design what comes next.Talk to our team about your fraud detection system.

ML Fraud Detection for eWallets: Model Design & Deployment

Key takeaways

Why Traditional Fraud Detection in eWallet Apps is a Hard ML Problem?

Extreme Class Imbalance and the “Accuracy Paradox”

Strict Real-Time Latency Budgets

Blind Spots to Collusive “Mule” Networks – interconnected accounts used to move and hide stolen funds

Concept Drift and Adversarial Adaptation

The High-Stakes Trade-off Between Security and Friction

How ML changes financial fraud detection

System Architecture for ML Fraud Detection at Scale

The end-to-end flow

Offline training vs. real-time serving

Feature Engineering for Payment & Behavioral Data

Real-Time Scoring Pipeline & Threshold Optimization

What This Looks Like in Production: G-Pay

Production Deployment, Monitoring, and Continuous Learning

Categories

Lastest News

Tags

Follow us