![]() |
![]() |
The basic blueprint for predicting a player’s upcoming season from his past is the Marcel method, introduced by prominent baseball and hockey analyst Tom Tango back in 2005. It’s a three-step process that
- starts with a weighted average of a player’s three most recent seasons;
- regresses that player’s performance toward the league mean, based on games played (we’ll explain why and how in a moment); and
- applies an age adjustment to account for developing rookies and declining veterans. With regard to the first step, Tango proposes a weighting of 5-4-3, while I prefer a 4-2-1 approach that assumes that every season’s data has twice the predictive power as the season previous.
Vollman, Rob, Awad, Tom, Fyffe, Iain. Hockey Abstract Presents Stat Shot. ECW Press. Kindle Edition.
mlb_df = load_data(get_data_url('2018_batting.csv'), 'Name', ['AB', 'H'])
mlb_df.head()
name | ab | h | |
---|---|---|---|
player_id | |||
abreujo02 | Jose Abreu | 499 | 132 |
acunaro01 | Ronald Acuna | 433 | 127 |
adamewi01 | Willy Adames | 288 | 80 |
adamja01 | Jason Adam | 0 | 0 |
adamsau02 | Austin L. Adams | 0 | 0 |
def hierarchical_normal(name, shape, μ=None):
if μ is None:
μ = pm.Normal(f"μ_{name}", 0., 5.)
Δ = pm.Normal(f"Δ_{name}", shape=shape)
σ = pm.HalfNormal(f"σ_{name}", 2.5)
return pm.Deterministic(name, μ + Δ * σ)
with pm.Model() as mlb_model:
η = hierarchical_normal("η", n_player)
ba = pm.Deterministic("ba", pm.math.sigmoid(η))
hits = pm.Binomial("hits", batter_df['ab'], ba, observed=batter_df['h'])
with mlb_model:
mlb_trace = pm.sample(**SAMPLE_KWARGS)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (3 chains in 3 jobs) NUTS: [σ_η, Δ_η, μ_η] Sampling 3 chains: 100%|██████████| 6000/6000 [00:49<00:00, 120.10draws/s]
az.plot_energy(mlb_trace);
(az.rhat(mlb_trace).max().to_array().max())
<xarray.DataArray ()> array(1.01)
fig
fig
fig
(mlb_trace['ba'][:, harper_ix] < mlb_trace['ba'][:, trout_ix]).mean()
0.96666666666666667
fig
(mlb_trace['ba'][:, harper_ix] < mlb_trace['ba'][:, hoskins_ix]).mean()
0.45700000000000002
ax.figure
nhl_df = load_data(get_data_url('2017_2018_goalies.csv'), 'Player', ['SA', 'SV'])
nhl_df.head()
name | sa | sv | |
---|---|---|---|
player_id | |||
allenja01 | Jake Allen | 1614 | 1462 |
andercr01 | Craig Anderson | 1768 | 1588 |
anderfr01 | Frederik Andersen | 2211 | 2029 |
appleke01 | Ken Appleby | 55 | 52 |
bernijo01 | Jonathan Bernier | 1092 | 997 |
with pm.Model() as nhl_model:
η = hierarchical_normal("η", n_goalie)
svp = pm.Deterministic("svp", pm.math.sigmoid(η))
saves = pm.Binomial("saves", nhl_df['sa'], svp, observed=nhl_df['sv'])
with nhl_model:
nhl_trace = pm.sample(nuts_kwargs={'target_accept': 0.9}, **SAMPLE_KWARGS)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (3 chains in 3 jobs) NUTS: [σ_η, Δ_η, μ_η] Sampling 3 chains: 100%|██████████| 6000/6000 [00:21<00:00, 276.80draws/s]
fig
ax.figure