Architecture

March 16, 2026 6 min read

The Feedback Loop: How ALF Gets Smarter Without Getting Dangerous

An AI system that learns without constraints can overfit, amplify biases, and make increasingly confident bets on patterns that don't exist. ALF's learning engine solves this with bounded governance.

The promise of AI in trading is simple: the system should get better over time. It observes outcomes, identifies what works, and adjusts. The more trades it sees, the better its predictions become.

The reality is more complicated. An AI system that learns without constraints can overfit to recent data, amplify its own biases, and make increasingly confident bets on patterns that exist only in its training history. The same feedback loop that makes the system smarter can make it dangerous — confidently wrong, with no mechanism to recognise its own deterioration.

ALF’s activation learning engine solves this by combining adaptive learning with bounded governance. The system gets smarter over time. But it gets smarter within constraints — and every adjustment is transparent, auditable, and subject to human oversight.

The Problem with Static Models

Most AI trading systems use static models. The model is trained on historical data, deployed to production, and runs until someone decides to retrain it. Performance degrades as market conditions drift from the training data. Nobody notices until the drawdown is significant enough to trigger a review.

This is the “deploy and pray” model. It works during the regime the model was trained on and fails when conditions change. And since markets change constantly — new correlations, new volatility regimes, new participants — the gap between the model’s assumptions and reality widens every day.

ALF's Approach

Instead of periodically retraining models, we continuously adjust the fusion weights — how much each model's output contributes to the composite signal — based on real-world trade outcomes. The models remain stable. Their relative influence evolves.

The alternative — frequent retraining — introduces its own risks. Retrain too often and you overfit to noise. Retrain too seldom and you miss genuine regime changes. The cadence of retraining becomes a critical parameter with no obvious right answer.

ALF’s approach is different. Instead of periodically retraining models, we continuously adjust the fusion weights — how much each model’s output contributes to the composite signal — based on real-world trade outcomes. The models themselves remain stable. Their relative influence evolves.

Thompson Sampling: The Mathematical Foundation

The learning engine uses Thompson Sampling with Beta-Bernoulli conjugate priors — a mathematically principled approach to the exploration-exploitation trade-off.

In plain language: the system maintains a probability distribution for each signal channel’s reliability. Every time a trade outcome is observed, the distribution is updated — successful outcomes increase the channel’s estimated reliability; unsuccessful outcomes decrease it. When calculating fusion weights, the system samples from these distributions, which means more reliable channels tend to receive higher weights while less reliable channels tend to receive lower weights.

Why Thompson Sampling rather than simpler approaches? Three reasons:

It handles uncertainty honestly. A channel with 10 observations and 8 successes has a different reliability profile than a channel with 1,000 observations and 800 successes — even though both have an 80% success rate. The first has much wider uncertainty. Thompson Sampling captures this naturally through the shape of the probability distribution.

It balances exploration and exploitation. A simpler system might permanently downweight a channel that had a bad run. Thompson Sampling maintains exploration — even a temporarily underperforming channel will occasionally receive higher weight, giving it the chance to prove that conditions have changed. This prevents the system from prematurely abandoning a signal source based on a short-term anomaly.

It converges efficiently. As more data accumulates, the probability distributions narrow, and the weights stabilise around values that reflect genuine long-run performance. The system adapts quickly when new evidence is strong and changes slowly when evidence is ambiguous.

Bounded Learning

Here’s where governance matters. An unconstrained learning system could — in theory — zero out a signal channel entirely, or concentrate all weight on a single channel that happens to be performing well in the short term. Both are dangerous: the first discards information that may be valuable in different conditions; the second creates concentration risk.

ALF’s learning engine operates within explicit bounds:

Weight floors and ceilings. No channel can be weighted below a minimum or above a maximum. Even if pattern recognition has a terrible month, it maintains a minimum contribution to the composite signal. Even if technical analysis has a stellar quarter, it can’t dominate the fusion to the exclusion of other perspectives.

Adjustment rate limits. Weights can’t change faster than a defined rate. This prevents the system from overreacting to short-term performance — a single bad trade doesn’t trigger a dramatic reallocation. Adjustments accumulate gradually, reflecting sustained performance trends rather than individual outcomes.

Readiness thresholds. The learning engine requires a minimum number of observations before it begins adjusting weights. Until sufficient data has accumulated, the system operates on configured default weights. This prevents premature optimisation based on a handful of trades.

Circuit breaker integration. If the circuit breaker fires — indicating market conditions outside normal parameters — the learning engine pauses. Weight adjustments based on data from abnormal conditions would contaminate the model’s understanding of normal performance. The system waits for stable conditions before resuming learning.

Governance Constraint

These bounds aren't limitations on the AI's intelligence. They're constraints that prevent intelligence from becoming recklessness. The system can learn. It can't learn itself into a corner.

The Audit Trail of Learning

Every weight adjustment the learning engine produces is recorded with the same deterministic audit trail as every other decision in the platform:

What the prior weights were. What outcomes triggered the adjustment. What the posterior weights are. When the adjustment was computed. Whether it was within bounds. Whether a human reviewed it.

This means you can reconstruct the complete learning history for any time period. If performance changed, you can see exactly which weight adjustments contributed. If a channel’s influence increased, you can trace it back to the specific trade outcomes that drove the change.

For regulatory purposes, this is essential. When AI systems learn and adapt, regulators need to understand what changed, why, and who was responsible for overseeing the change. ALF’s learning audit trail answers all three questions with cryptographically verifiable evidence.

Human Oversight of Machine Learning

The operator can intervene in the learning process at multiple levels:

Review adjustments. See what the learning engine is proposing and understand the outcome data driving the proposal. This isn’t a dashboard you check once a month. The adjustments and their rationale are part of the normal operational view.

Freeze weights. If the operator believes market conditions are abnormal in ways the system hasn’t detected — a known upcoming event, a structural change in the market — they can freeze weight adjustments temporarily. The system continues generating signals with current weights but doesn’t learn from outcomes during the freeze period.

Override weights. The operator can manually set weights, overriding the learning engine’s output. The override and its rationale are recorded in the audit trail. The learning engine resumes from the overridden state when the operator releases control.

Set bounds. The operator defines the minimum and maximum weights for each channel, the adjustment rate limits, and the readiness thresholds. These meta-parameters govern how the learning engine operates.

This is human-in-loop applied to machine learning itself — not just to individual trading decisions, but to the system’s ongoing evolution. The AI gets smarter. The human controls how.

The Complete Loop

Signal generation. Fusion. Validation. Human review. Execution. Outcome capture. Learning. Weight adjustment. Better signal generation.

Each iteration improves the system’s understanding of which signal sources are reliable in current market conditions. Each iteration is bounded, auditable, and subject to human oversight. The system gets smarter without getting dangerous.

That’s the feedback loop institutional capital can trust: one that shows its working, respects its constraints, and keeps a human in charge of its own evolution.

Scott Davies is the Chief Architect and Founder of ALF Capital, where AI learning is bounded, transparent, and governed — not a black box that optimises itself.

The Feedback Loop: How ALF Gets Smarter Without Getting Dangerous

The Problem with Static Models

Thompson Sampling: The Mathematical Foundation

Bounded Learning

The Audit Trail of Learning

Human Oversight of Machine Learning

The Complete Loop

Related Insights

Bring Your Own Strategy: Why We Don't Sell Alpha

Multi-Model Signal Fusion: Why One AI Isn't Enough

What VERIFIABLE Means for AI-Assisted Trading