Probabilistic Programming in Python with PyMC3¶

Probabilistic Programming¶

• Write a program to simulate the data generating process
• Language runtime/software library performs inference

Probabilistic Programming — storytelling enables inference

Benefits of probabilistic programming¶

• Model-agnostic inference algorithms
• Quantification of uncertainty
• Incorporation of domain knowledge
• Modularity

The Monty Hall Problem¶

Initially, we have no information about which door the prize is behind.

In [9]:
import pymc3 as pm

with pm.Model() as monty_model:
prize = pm.DiscreteUniform('prize', 0, 2)


If we choose door one:

 Monty can open Prize behind Door 1 Door 2 Door 3 Door 1 No Yes Yes Door 2 No No Yes Door 2 No Yes No
In [10]:
from theano import tensor as tt

with monty_model:
p_open = pm.Deterministic('p_open',
tt.switch(tt.eq(prize, 0),
# it is behind the first door
np.array([0., 0.5, 0.5]),
tt.switch(tt.eq(prize, 1),
# it is behind the second door
np.array([0., 0., 1.]),
# it is behind the third door
np.array([0., 1., 0.]))))


Monty opened the third door, revealing a goat.

In [11]:
with monty_model:
opened = pm.Categorical('opened', p_open, observed=2)


Should we switch our choice of door?

In [12]:
with monty_model:
monty_trace = pm.sample(1000, init=None, random_seed=SEED)

monty_df = pm.trace_to_dataframe(monty_trace)

Assigned Metropolis to prize
100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1000/1000 [00:00<00:00, 2850.75it/s]

In [13]:
monty_df.head()

Out[13]:
prize p_open__0 p_open__1 p_open__2
0 1 0.0 0.0 1.0
1 0 0.0 0.5 0.5
2 0 0.0 0.5 0.5
3 0 0.0 0.5 0.5
4 0 0.0 0.5 0.5
In [14]:
monty_df.prize.mean()

Out[14]:
0.67300000000000004
In [16]:
fig

Out[16]:

From the PyMC3 documentation:

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

PyMC3 features¶

• Uses Theano as a computational backend
• Automated differentiation, dynamic C compilation, GPU integration
• Implements Hamiltonian Monte Carlo and No-U-Turn sampling
• High-level GLM (R-like syntax) and Gaussian process specification
• Provides a framework for operator variational inference
In [22]:
HTML(mcmc_video)

Out[22]:

Statistical workflow¶

1. Tell a story about how the data may have been generated
2. Encode that story as a probabilistic model
3. Perform inference
4. Check model diagnostics
5. Compare models
6. Iterate (GOTO 1)

Case study: NBA foul calls¶

Last two minute report¶

In [24]:
nba_df.head()

Out[24]:
seconds_left review_decision away home score_away score_home disadvantaged_team committing_team
0 111.0 INC UTA POR 104.0 113.0 POR UTA
1 102.0 CNC UTA POR 104.0 113.0 UTA POR
2 98.0 CC UTA POR 104.0 113.0 POR UTA
3 64.0 CNC UTA POR 104.0 113.0 UTA POR
4 62.0 INC UTA POR 104.0 113.0 POR UTA

Story¶

Close basketball games usually end with intentional fouls.

In [27]:
fig

Out[27]:

Model¶

The foul call probability varies over time.

In [30]:
seconds_left_ = theano.shared(seconds_left)

with pm.Model() as nba_model:
Î² = pm.GaussianRandomWalk('Î²', tau=PRIOR_PREC, init=pm.Normal.dist(0., 10.),
shape=n_sec)


Transform logodds to probabilities, specify observation model.

In [31]:
with nba_model:
p = pm.math.sigmoid(Î²[seconds_left_])
fc = pm.Bernoulli('fc', p, observed=foul_called)


Inference¶

In [34]:
nba_trace = sample_from_model(nba_model)

Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 2,955.1: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 50000/50000 [00:53<00:00, 937.33it/s]
Finished [100%]: Average Loss = 2,955.1
100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2000/2000 [02:33<00:00, 13.07it/s]

• Automatic sampler assignment (manually configurable)
• Automatic initialization of HMC/NUTS using ADVI

Model diagnostics — posterior predictions

To sample from the posterior predictive distribution, we change the value of the Theano shared variable seconds_left_.

In [36]:
pp_sec = np.arange(n_sec)
seconds_left_.set_value(pp_sec)

with nba_model:
pp_nba_trace = pm.sample_ppc(nba_trace, N_PP_SAMPLES)

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1000/1000 [00:06<00:00, 147.16it/s]

In [38]:
fig

Out[38]:

Model diagnostics — sampling

The folk theorem [of statistical computing] is this: When you have computational problems, often thereâ€™s a problem with your model.

In [39]:
pm.energyplot(nba_trace).legend(bbox_to_anchor=(0.6, 1));


Model selection¶

PyMC3 can calculate:

• Deviance information criterion (DIC)
• Bayesian predictive information criterion (BPIC)
• Widely applicable/Watanabe-Akaike information criterion (WAIC)
In [41]:
with nba_model:
nba_waic = pm.waic(nba_trace)

print(nba_waic)

WAIC_r(WAIC=5834.1378643060807, WAIC_se=79.294920797807535, p_WAIC=16.602451)


Story — take two

The trailing team has more incentive to foul than the leading team.

In [45]:
fig

Out[45]:

Model and inference¶

There is a base rate at which fouls are called.

In [46]:
with pm.Model() as trailing_model:
Î± = pm.Normal('Î±', 0., 10.)


The trailing team is more likely to commit fouls as the game nears its end.

In [48]:
seconds_left_.set_value(seconds_left)
committing_trailing_ = theano.shared(1. * (score_diff < 0))

with trailing_model:
Î² = pm.GaussianRandomWalk('Î²', tau=PRIOR_PREC, init=pm.Normal.dist(0., 10.),
shape=n_sec)
Î· = Î± + committing_trailing_ * Î²[seconds_left_]

In [50]:
trailing_trace = sample_from_model(trailing_model, quiet=True)


Model diagnostics — posterior prediction

In [54]:
fig

Out[54]:

Model selection¶

In [59]:
fig

Out[59]:

Story — take three

Larger score differences incentivize the trailing team to foul sooner and more aggressively.

In [62]:
fig

Out[62]:

Model and inference¶

There is a base rate at which fouls are called, and the trailing team is more likely to commit fouls as the game nears its end.

In [63]:
seconds_left_.set_value(seconds_left)
committing_trailing_ = theano.shared(1. * (score_diff < 0))

with pm.Model() as diff_model:
Î± = pm.Normal('Î±', 0., 10.)
Î² = pm.GaussianRandomWalk('Î²', tau=PRIOR_PREC, init=pm.Normal.dist(0., 10.),
shape=n_sec)


The teams trailing by more points foul more agressively and sooner.

In [64]:
score_diff_ = theano.shared(score_diff)

with diff_model:
Î³ = pm.Normal('Î³', 0., 10.)
Î¸ = pm.Normal('Î¸', 0., 10.)
Î» = pm.math.sigmoid(Î³ + Î¸ * score_diff_)

Î· = Î± + committing_trailing_ * Î» * Î²[seconds_left_]

In [66]:
diff_trace = sample_from_model(diff_model, quiet=True)


Model diagnostics¶

Posterior predictive inference¶

In [71]:
fig

Out[71]: