Our Prediction Model
A transparent look at what powers our predictions — the data, the methods, and the honest performance numbers.
Our Approach
We predict expected goals (pG) rather than match outcomes directly. Why? Because goals come from chances, and chance quality is more consistent and predictable than whether individual shots hit the net.
We predict: "How many goals' worth of chances will each team create?"
Then we derive: Win/draw/loss probabilities using statistical methods.
This approach lets us build on xG — a metric validated by years of academic research and now standard across professional football analytics.
Technical Overview
Model Type
Gradient Boosted Decision Trees (LightGBM)
Two separate models: one for home goals, one for away goals. Poisson objective ensures predictions are always non-negative.
Feature Set
240+ engineered features per match
Form metrics, head-to-head history, home/away splits, xG trends, defensive records, and contextual factors.
Training Data
6,500+ matches across multiple leagues
Historical data from 2016-present, including detailed match statistics and actual xG values.
Updates
Weekly model retraining
Model is retrained every Sunday with the latest match data to capture evolving team performance.
What the Model Analyzes
Recent Form (Last 5-10 Matches)
Not just wins and losses, but the quality of performances:
- • xG created and conceded per match
- • Goals scored vs. xG (over/underperformance)
- • Form in different contexts (home, away, vs. top teams)
- • Momentum trends (improving or declining?)
Head-to-Head History
Some matchups have consistent patterns:
- • Historical goal averages in this specific fixture
- • xG patterns when these teams meet
- • Recent results weighted more than older ones
Home/Away Performance
Home advantage is real but varies significantly:
- • Team-specific home/away performance splits
- • Stadium effects and crowd factors
- • Travel considerations for away teams
Attacking & Defensive Metrics
Both sides of the ball matter:
- • Shot quality and shot volume
- • Defensive xG conceded
- • Clean sheet rates
- • Conversion efficiency
Contextual Factors
Match context influences performance:
- • Days since last match (fatigue)
- • League position and points needed
- • Season phase (early, mid, end)
How We Validate Performance
Walk-Forward Validation
We test on data the model hasn't seen, simulating real-world usage. Train on past → predict future → measure accuracy → repeat.
Why this matters:
Many prediction sites accidentally "cheat" by testing on data that overlaps with training. Our walk-forward approach ensures the performance numbers you see reflect what you'd actually experience.
Temporal Data Filtering
Every calculation uses only data available before the match date. When we predict a match on January 15th, we only use data from January 14th and earlier. No future peeking, ever.
The Honest Cost
Proper validation reduced our accuracy from inflated 80%+ numbers to real ~72% for 1X2 predictions. But 72% is what you'll actually experience — and that's still significantly better than chance (33%).
Real Performance Numbers
Walk-Forward Validated Results
Based on validation across 2,000+ Premier League matches (2016-2024)
What We Don't Capture (Yet)
Transparency means acknowledging limitations:
Player-level data: We don't yet model individual injuries, suspensions, or player form.
Tactical matchups: Formation interactions and manager tactical adjustments aren't modeled.
Motivation factors: Relegation desperation, title pressure, and derby intensity are hard to quantify.
Weather and pitch: Environmental conditions aren't currently factored in.
These represent future improvement opportunities. The fact that we're achieving 72%+ without them suggests the foundation is solid.
The Bottom Line
Our model uses established machine learning techniques on quality historical data with honest validation practices. The numbers we show are real, validated on unseen data, with no future peeking.
~72% accuracy on 1X2 won't make you rich overnight, but it's a genuine edge over random chance. Use it as one input in your decision-making, not the only input.