Probabilistic Programming in Python with PyMC3¶

#ODSC East, Boston • May 5, 2017 • @AustinRochford ¶

About Me¶

@AustinRochford • austinrochford.com • github.com/AustinRochford ¶

austin.rochford@gmail.com • arochford@monetate.com ¶

PyMC3 contributor • Principal Data Scientist at Monetate Labs ¶

Probabilistic Programming¶

Write a program to simulate the data generating process
Language runtime/software library performs inference

Data Science — inference enables storytelling

Probabilistic Programming — storytelling enables inference

Enter: Bayes theorem¶

Benefits of probabilistic programming¶

Model-agnostic inference algorithms
Quantification of uncertainty
Incorporation of domain knowledge
Modularity

The Monty Hall Problem¶

Initially, we have no information about which door the prize is behind.

In [9]:

import pymc3 as pm

with pm.Model() as monty_model:
    prize = pm.DiscreteUniform('prize', 0, 2)

If we choose door one:

	Monty can open
Prize behind	Door 1	Door 2	Door 3
Door 1	No	Yes	Yes
Door 2	No	No	Yes
Door 2	No	Yes	No

In [10]:

from theano import tensor as tt

with monty_model:
    p_open = pm.Deterministic('p_open',
                              tt.switch(tt.eq(prize, 0),
                                        # it is behind the first door
                                        np.array([0., 0.5, 0.5]),
                              tt.switch(tt.eq(prize, 1),
                                        # it is behind the second door
                                        np.array([0., 0., 1.]),
                                        # it is behind the third door
                                        np.array([0., 1., 0.]))))

Monty opened the third door, revealing a goat.

In [11]:

with monty_model:
    opened = pm.Categorical('opened', p_open, observed=2)

Should we switch our choice of door?

In [12]:

with monty_model:
    monty_trace = pm.sample(1000, init=None, random_seed=SEED)
    
monty_df = pm.trace_to_dataframe(monty_trace)

Assigned Metropolis to prize
100%|██████████| 1000/1000 [00:00<00:00, 2850.75it/s]

In [13]:

monty_df.head()

Out[13]:

	prize	p_open__1	p_open__2
0	1	0.0	1.0
1	0	0.5	0.5
2	0	0.5	0.5
3	0	0.5	0.5
4	0	0.5	0.5

In [14]:

monty_df.prize.mean()

Out[14]:

0.67300000000000004

In [16]:

fig

Out[16]:

From the PyMC3 documentation:

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

PyMC3 features¶

Uses Theano as a computational backend
- Automated differentiation, dynamic C compilation, GPU integration
Implements Hamiltonian Monte Carlo and No-U-Turn sampling
High-level GLM (R-like syntax) and Gaussian process specification
Provides a framework for operator variational inference
- Implements ADVI and SVGD

In [22]:

HTML(mcmc_video)

Out[22]:

Statistical workflow¶

Tell a story about how the data may have been generated
Encode that story as a probabilistic model
Perform inference
Check model diagnostics
Compare models
Iterate (GOTO 1)

Statistical LEGOs¶

Case study: NBA foul calls¶

Last two minute report¶

In [24]:

nba_df.head()

Out[24]:

	seconds_left	review_decision	away	home	score_away	score_home	disadvantaged_team	committing_team
0	111.0	INC	UTA	POR	104.0	113.0	POR	UTA
1	102.0	CNC	UTA	POR	104.0	113.0	UTA	POR
2	98.0	CC	UTA	POR	104.0	113.0	POR	UTA
3	64.0	CNC	UTA	POR	104.0	113.0	UTA	POR
4	62.0	INC	UTA	POR	104.0	113.0	POR	UTA

Story¶

Close basketball games usually end with intentional fouls.

In [27]:

fig

Out[27]:

Model¶

The foul call probability varies over time.

In [30]:

seconds_left_ = theano.shared(seconds_left)

with pm.Model() as nba_model:
    β = pm.GaussianRandomWalk('β', tau=PRIOR_PREC, init=pm.Normal.dist(0., 10.),
                              shape=n_sec)

Transform logodds to probabilities, specify observation model.

In [31]:

with nba_model:
    p = pm.math.sigmoid(β[seconds_left_])
    fc = pm.Bernoulli('fc', p, observed=foul_called)

Inference¶

In [34]:

nba_trace = sample_from_model(nba_model)

Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 2,955.1: 100%|██████████| 50000/50000 [00:53<00:00, 937.33it/s] 
Finished [100%]: Average Loss = 2,955.1
100%|██████████| 2000/2000 [02:33<00:00, 13.07it/s]

Automatic sampler assignment (manually configurable)
Automatic initialization of HMC/NUTS using ADVI

Model diagnostics — posterior predictions

To sample from the posterior predictive distribution, we change the value of the Theano shared variable seconds_left_.

In [36]:

pp_sec = np.arange(n_sec)
seconds_left_.set_value(pp_sec)

with nba_model:
    pp_nba_trace = pm.sample_ppc(nba_trace, N_PP_SAMPLES)

100%|██████████| 1000/1000 [00:06<00:00, 147.16it/s]

In [38]:

fig

Out[38]:

Model diagnostics — sampling

The folk theorem [of statistical computing] is this: When you have computational problems, often there’s a problem with your model.

— Andrew Gelman

In [39]:

pm.energyplot(nba_trace).legend(bbox_to_anchor=(0.6, 1));

Model selection¶

PyMC3 can calculate:

Deviance information criterion (DIC)
Bayesian predictive information criterion (BPIC)
Widely applicable/Watanabe-Akaike information criterion (WAIC)

In [41]:

with nba_model:
    nba_waic = pm.waic(nba_trace)
    
print(nba_waic)

WAIC_r(WAIC=5834.1378643060807, WAIC_se=79.294920797807535, p_WAIC=16.602451)

Story — take two

The trailing team has more incentive to foul than the leading team.

In [45]:

fig

Out[45]:

Model and inference¶

There is a base rate at which fouls are called.

In [46]:

with pm.Model() as trailing_model:
    α = pm.Normal('α', 0., 10.)

The trailing team is more likely to commit fouls as the game nears its end.

In [48]:

seconds_left_.set_value(seconds_left)
committing_trailing_ = theano.shared(1. * (score_diff < 0))

with trailing_model:
    β = pm.GaussianRandomWalk('β', tau=PRIOR_PREC, init=pm.Normal.dist(0., 10.),
                              shape=n_sec)
    η = α + committing_trailing_ * β[seconds_left_]

In [50]:

trailing_trace = sample_from_model(trailing_model, quiet=True)

Model diagnostics — posterior prediction

In [54]:

fig

Out[54]:

Model selection¶

In [59]:

fig

Out[59]:

Story — take three

Larger score differences incentivize the trailing team to foul sooner and more aggressively.

In [62]:

fig

Out[62]:

Model and inference¶

There is a base rate at which fouls are called, and the trailing team is more likely to commit fouls as the game nears its end.

In [63]:

seconds_left_.set_value(seconds_left)
committing_trailing_ = theano.shared(1. * (score_diff < 0))

with pm.Model() as diff_model:
    α = pm.Normal('α', 0., 10.)
    β = pm.GaussianRandomWalk('β', tau=PRIOR_PREC, init=pm.Normal.dist(0., 10.),
                              shape=n_sec)

The teams trailing by more points foul more agressively and sooner.

In [64]:

score_diff_ = theano.shared(score_diff)

with diff_model:
    γ = pm.Normal('γ', 0., 10.)
    θ = pm.Normal('θ', 0., 10.)
    λ = pm.math.sigmoid(γ + θ * score_diff_)
    
    η = α + committing_trailing_ * λ * β[seconds_left_]

In [66]:

diff_trace = sample_from_model(diff_model, quiet=True)

Model diagnostics¶

Posterior predictive inference¶

In [71]:

fig

Out[71]:

Model selection¶

In [74]:

fig

Out[74]:

Getting started with PyMC3¶

Contributing¶

Probabilistic Programming Ecosystem¶

Probabilistic Programming System	Language	Discrete Variable Support	Automatic Differentiation/Hamiltonian Monte Carlo	Variational Inference
PyMC3	Python	Yes	Yes	Yes
Stan	C++, R, Python, ...	No	Yes	Yes
Edward	Python, ...	Yes	Yes	Yes
BUGS	Standalone program, R	Yes	No	No
JAGS	Standalone program, R	Yes	No	No

Thank you!¶

The Jupyter notebook used to generate these slides is available here.

Probabilistic Programming in Python with PyMC3¶

#ODSC East, Boston • May 5, 2017 • @AustinRochford¶

About Me¶

@AustinRochford • austinrochford.com • github.com/AustinRochford¶

austin.rochford@gmail.com • arochford@monetate.com¶

PyMC3 contributor • Principal Data Scientist at Monetate Labs¶

Probabilistic Programming¶

Data Science — inference enables storytelling

Probabilistic Programming — storytelling enables inference

Enter: Bayes theorem¶

Benefits of probabilistic programming¶

The Monty Hall Problem¶

PyMC3 features¶

Statistical workflow¶

Statistical LEGOs¶

Case study: NBA foul calls¶

Last two minute report¶

Story¶

Model¶

Inference¶

Model diagnostics — posterior predictions

Model diagnostics — sampling

Model selection¶

Story — take two

Model and inference¶

Model diagnostics — posterior prediction

Model selection¶

Story — take three

Model and inference¶

Model diagnostics¶

Posterior predictive inference¶

Model selection¶

Getting started with PyMC3¶

Contributing¶

Probabilistic Programming Ecosystem¶

Thank you!¶

#ODSC East, Boston • May 5, 2017 • @AustinRochford ¶

@AustinRochford • austinrochford.com • github.com/AustinRochford ¶

austin.rochford@gmail.com • arochford@monetate.com ¶

PyMC3 contributor • Principal Data Scientist at Monetate Labs ¶