How MLB Predictions Work
Full methodology · Data sources · Limitations
OVERVIEW
SandmanEdge uses a two-stage prediction pipeline. First, a statistical confidence engine scores the matchup using 20 weighted factors from the database. That score is passed to the SandmanEdge engine along with detailed team and player data, which generates the final picks, reasoning, and score prediction. Neither stage should be used for real betting decisions.
STAGE 1 — STATISTICAL CONFIDENCE SCORE
Before calling the SandmanEdge engine, we calculate a Stat Score from 52–85% representing the home team's statistical edge. This is a pure formula — no AI involved. It's built from 20 weighted factors (max 105 pts above the 50-point baseline). The engine then reads this score alongside all the detailed context and outputs its own Engine Confidence — which can differ significantly when it identifies things the formula missed, like narrative momentum, travel fatigue, or a team hiding injuries.
FACTORS (hover each for details in the Score Breakdown):
1
Starting Pitcher
±12 pts
Starting pitcher ERA, WHIP, strikeouts per 9. Recent form (last 3 starts). Home/away splits. Matchup vs opposing lineup batting avg. Elite ace (ERA <2.50) vs struggling starter (ERA >5.00) = full ±12 swing. Most critical factor in baseball.
2
Bullpen Strength
±10 pts
Team bullpen ERA over last 30 days. Save percentage. High-leverage reliever availability (checked against injuries). Strong pen (ERA <3.00, 85%+ saves) vs weak (ERA >4.50, <70% saves) = ±10.
3
Team Offense
±10 pts
Runs per game, OPS (on-base + slugging), batting avg vs pitcher handedness. Hot streak (5+ runs in 4 of last 5) vs cold streak (<3 runs) = significant edge.
4
Win %
±8 pts
Season win percentage differential. A 65% team vs 45% team earns the full ±8. Baseball has more parity than NBA, so slightly reduced from basketball.
5
Form (L10)
±8 pts
Last 10 games record. Baseball momentum matters — hot teams (8-2 or better) vs cold teams (2-8) earn the full ±8.
6
Park Factors
±6 pts
Stadium run environment. Coors Field (park factor 1.20) heavily favors hitters. Petco Park (0.85) suppresses runs. Factor in stadium dimensions, elevation, wind patterns.
7
Weather
±5 pts
Temperature, wind speed/direction. Hot weather (85°F+) and wind blowing out = offense boost. Cold (<55°F) and wind in = pitcher friendly. Rain delays affect bullpen usage.
8
Injuries
±10 pts
MVP-caliber hitter (OPS >1.000) Out = -5 pts. Ace starter injury = -4 pts. All-Star hitter = -3.5, everyday starter = -2, bench = -1. Multiplied by status (Out×1.0, Doubtful×0.75, Questionable×0.45).
9
Rest Days
±4 pts
Days since last game. Teams on back-to-back games less affected than NBA/NHL due to roster depth. But bullpen fatigue matters — 3+ games in 3 days = -2 pts for bullpen strain.
10
Clutch (1-run)
±3 pts
Win % in games decided by 1 run (minimum 5 such games). Captures late-inning execution and closer reliability. Capped at ±3 due to small sample.
11
Travel Fatigue
±4 pts
Cross-country trips (2000+ miles) = -3 for away team. Time zone shifts (3+ hours) add -1. Home team on 5+ game homestand gets +1. Less impactful than NBA/NHL due to travel day off culture.
12
Home/Away Splits
±4 pts
Team performance at home vs away. Some teams vastly overperform at home (e.g., Rockies at Coors). Compare home runs/game differential vs away runs/game.
13
Batting vs Pitcher
±4 pts
Team batting avg vs LHP/RHP matchup. A team that hits .280 vs RHP facing RHP starter gets edge vs team that hits .240 vs RHP. Handedness matchup critical in baseball.
14
Home Field
+3 flat
Flat bonus for home team. MLB home teams win ~54% historically — less than NBA (58%) or NCAA (60%), but still meaningful.
15
Str. of Schedule
±4 pts
Average opponent win% across all completed games. Teams with .540+ SOS have faced tougher opponents; .460- SOS means a soft schedule. Every 4% SOS gap = ±2 pts, capped at ±4.
16
Schedule Density
±3 pts
Games played in last 7 days. MLB average is ~6 games/week. 7+ = heavy, 6 = compressed, ≤4 = light. Captures bullpen wear and roster fatigue from packed schedules.
17
Momentum
±3 pts
L10 vs L11-30 trend direction. Hot streaks and cold streaks matter in baseball. Improving win% and run differential get a boost.
18
Fatigue Stack
±2 pts
Compound modifier when rest + travel + density all stack. 3 flags vs 0-1 = ±2; 2 flags vs 0 = ±1. Captures bullpen/lineup fatigue compounding.
19
Pace
±2 pts
Pace differential based on average game totals (runs). High-scoring games (>9.5 runs) vs low (<7.5) indicate lineup tempo mismatch.
20
Vegas (30%)
blend
Vegas implied win probability blended at 30% weight. When moneylines are NULL (pre-season, early lines), skip this blend and mark prediction as stats-only.
The raw score starts at 50 and all factors are added or subtracted. This score is passed to the SandmanEdge engine as a starting point, not a final answer.
STAGE 2 — SANDMANEDGE ANALYSIS
The SandmanEdge engine receives all available data and generates a structured prediction. Here is exactly what data is analyzed:
→ Team Season Stats
Win-loss record, runs per game, runs against, batting average, OBP, slugging, ERA, WHIP, K/9, home runs, errors, and fielding %. All analyzed as per-game rates.
→ Home / Away Splits
Each team's record, scoring average, and scoring margin split by home vs away games. Reveals if a team performs significantly differently at home — some teams have massive home/away splits that the overall record masks.
→ Starting Pitcher Matchup
The confirmed starting pitcher for each team with ERA calculated from their last 5 starts (game logs). Pitcher handedness (LHP/RHP) is factored in — left-handed starters are rarer and teams tend to hit worse against them.
→ Bullpen Strength
Bullpen ERA calculated from actual reliever game logs (non-starter appearances). Falls back to team ERA if insufficient data. Includes recent workload context.
→ Park Factors & Weather
Stadium run environment factor (Coors Field +20%, Petco Park -15%) applied bidirectionally. Weather includes temperature impact and wind direction — outbound wind boosts offense, inbound wind suppresses it.
→ Injury Report
All players listed as Out, Doubtful, or Questionable with position and body part affected. Injuries are sorted by player importance so the most impactful absences appear first.
→ Recent Form
Results, scores, and opponents from each team's last 10 completed games. Context on momentum and recent schedule difficulty.
→ Head-to-Head History
Last 5 meetings between these two teams with dates, scores, and winner. Some teams have significant H2H edges regardless of overall record.
→ Rest & Fatigue
Days since each team's last game plus travel distance (GPS-based) between arenas. Back-to-backs, cross-country trips, and compressed schedules are all factored in.
→ Divisional Context
Matchup type detection — divisional (high familiarity, rivalry intensity), conference, or interconference. Divisional matchups weight H2H and trends more heavily.
→ Clutch Record
Win % in close games (1-run margin). Some teams consistently win close games while others collapse.
→ Vegas Lines
Actual moneyline, spread, and over/under from SportsDataIO. The engine evaluates whether the predicted margin supports or challenges the spread.
→ Calibration Feedback
The engine's recent accuracy is fed back in: ML/ATS/O&U records, accuracy by confidence bucket, and average score deviation. If recent predictions are off, the engine self-corrects.
→ Statistical Confidence Score
The Stage 1 score (52–85%) from 20 weighted factors is passed to the engine as a starting point, blended with 30% Vegas implied probability.
→ Advanced Statistical Models
Four independent models run alongside the confidence engine: ELO ratings (chess-style power ratings updated after each game), Pythagorean W% (expected win rate from points scored/allowed), Log5 (head-to-head probability from each team's true strength), and an Ensemble that blends all models with Vegas implied probability. These are displayed in the Advanced Models card below the Score Breakdown.
→ Adaptive Factor Weights
The engine tracks which confidence factors historically correlate with correct vs. incorrect predictions. Factors that consistently predict well get boosted (up to 1.5x), while unreliable factors get dampened (down to 0.5x). This self-tuning requires 20+ graded predictions to activate.
→ Closing Line Value (CLV)
After games are graded, the system compares the lines at prediction time to the closing lines. Consistently beating the closing line (positive CLV) indicates genuine edge over the market. Track CLV trends on the History page's CLV Tracker tab.
HOW EACH PICK IS GENERATED
MONEYLINE
SandmanEdge predicts the winner and confidence percentage. The moneyline odds shown are calculated from the engine's confidence using standard probability-to-odds conversion — they are our model's implied odds, not the sportsbook's line.
RUN LINE
The run line displayed is the real Vegas line from SportsDataIO. If the run line is -1.5 and the engine predicts a 2-run win, the favorite covers. The line never changes between predictions — only the cover pick changes based on the engine's score prediction.
OVER / UNDER
SandmanEdge predicts a final score for both teams. The predicted total is compared to the Vegas O/U line. Confidence is based on how far the predicted total deviates from the line. Confidence is capped at 75% because totals are notoriously difficult to predict.
LOW CONFIDENCE FLAG
The engine flags low_confidence: true when it detects significant uncertainty — for example, key injury statuses that are listed as Questionable (could play or not), very evenly matched teams where the data is inconclusive, or missing data for a key player. When flagged, a gold warning banner appears above the picks. This is SandmanEdge being honest about its own limitations rather than always projecting false certainty.
KNOWN LIMITATIONS
—
Player stats require game logs to be refreshed regularly. If logs are stale, last-5-game averages may not reflect current form.
—
Opponent points per game (defensive rating) may be null if the SQL population query has not been run. This reduces the accuracy of the scoring margin factor.
—
The pace estimate is based on combined game totals, not true possession count. It is an approximation.
—
Vegas lines (spread, moneyline, O/U) are only as current as the last data refresh. Lines can move significantly on game day.
—
SandmanEdge's analysis reflects the data it is given — garbage in, garbage out. Stale injuries or missing player logs will hurt prediction quality.
—
No model can consistently predict = strtoupper($sport) ?> games at better than ~60% accuracy over a full season. Anyone claiming otherwise is misleading you.
SandmanEdge Prediction Engine · For entertainment purposes only · Not financial or betting advice
Select a Game to Analyze
Last refresh: 4:40 PM EDT