Our Prediction Model

A transparent look at what powers our predictions — the data, the methods, and the honest performance numbers.

7 minute read

Our Approach

We predict expected goals (pG) rather than match outcomes directly. Why? Because goals come from chances, and chance quality is more consistent and predictable than whether individual shots hit the net.

We predict: "How many goals' worth of chances will each team create?"

Then we derive: Win/draw/loss probabilities using statistical methods.

This approach lets us build on xG — a metric validated by years of academic research and now standard across professional football analytics.

Technical Overview

Model Type

Gradient Boosted Decision Trees (LightGBM)

Two separate models: one for home goals, one for away goals. Poisson objective ensures predictions are always non-negative.

Feature Set

240+ engineered features per match

Form metrics, head-to-head history, home/away splits, xG trends, defensive records, and contextual factors.

Training Data

6,500+ matches across multiple leagues

Historical data from 2016-present, including detailed match statistics and actual xG values.

Updates

Weekly model retraining

Model is retrained every Sunday with the latest match data to capture evolving team performance.

What the Model Analyzes

Recent Form (Last 5-10 Matches)

Not just wins and losses, but the quality of performances:

  • • xG created and conceded per match
  • • Goals scored vs. xG (over/underperformance)
  • • Form in different contexts (home, away, vs. top teams)
  • • Momentum trends (improving or declining?)

Head-to-Head History

Some matchups have consistent patterns:

  • • Historical goal averages in this specific fixture
  • • xG patterns when these teams meet
  • • Recent results weighted more than older ones

Home/Away Performance

Home advantage is real but varies significantly:

  • • Team-specific home/away performance splits
  • • Stadium effects and crowd factors
  • • Travel considerations for away teams

Attacking & Defensive Metrics

Both sides of the ball matter:

  • • Shot quality and shot volume
  • • Defensive xG conceded
  • • Clean sheet rates
  • • Conversion efficiency

Contextual Factors

Match context influences performance:

  • • Days since last match (fatigue)
  • • League position and points needed
  • • Season phase (early, mid, end)

How We Validate Performance

Walk-Forward Validation

We test on data the model hasn't seen, simulating real-world usage. Train on past → predict future → measure accuracy → repeat.

Why this matters:

Many prediction sites accidentally "cheat" by testing on data that overlaps with training. Our walk-forward approach ensures the performance numbers you see reflect what you'd actually experience.

Temporal Data Filtering

Every calculation uses only data available before the match date. When we predict a match on January 15th, we only use data from January 14th and earlier. No future peeking, ever.

The Honest Cost

Proper validation reduced our accuracy from inflated 80%+ numbers to real ~72% for 1X2 predictions. But 72% is what you'll actually experience — and that's still significantly better than chance (33%).

Real Performance Numbers

Walk-Forward Validated Results

~72%
1X2 Success Rate
vs. 33% random chance
~91%
Double Chance Success
vs. 67% random chance
0.73
Home Goals MAE
Average prediction error
0.65
Away Goals MAE
Average prediction error

Based on validation across 2,000+ Premier League matches (2016-2024)

What We Don't Capture (Yet)

Transparency means acknowledging limitations:

Player-level data: We don't yet model individual injuries, suspensions, or player form.

Tactical matchups: Formation interactions and manager tactical adjustments aren't modeled.

Motivation factors: Relegation desperation, title pressure, and derby intensity are hard to quantify.

Weather and pitch: Environmental conditions aren't currently factored in.

These represent future improvement opportunities. The fact that we're achieving 72%+ without them suggests the foundation is solid.

The Bottom Line

Our model uses established machine learning techniques on quality historical data with honest validation practices. The numbers we show are real, validated on unseen data, with no future peeking.

~72% accuracy on 1X2 won't make you rich overnight, but it's a genuine edge over random chance. Use it as one input in your decision-making, not the only input.

Learn More