Who am I?¶

PyMC3 developer • Chief Data Scientist @ Monetate ¶

@AustinRochford • Website • GitHub ¶

arochford@monetate.com • austin.rochford@gmail.com ¶

About This Talk¶

About Monetate¶

Founded 2008, web optimization and personalization SaaS
Observed 6.2B impressions and $4.1B in revenue during Cyber Week 2018

monetate.com/about/careers

Modern Bayesian Inference¶

Motivating Examples¶

2017 UK General Election¶

Source

NBA Foul Calls¶

Source

Player skills(?)¶

Probabilistic Programming¶

Data Science — inference enables storytelling

Probabilistic Programming — storytelling enables inference

The Monty Hall Problem¶

Initially, we have no information about which door the prize is behind.

In [9]:

import pymc3 as pm

with pm.Model() as monty_model:
    prize = pm.DiscreteUniform('prize', 0, 2)

If we choose door one:

	Monty can open
Prize behind	Door 1	Door 2	Door 3
Door 1	No	Yes	Yes
Door 2	No	No	Yes
Door 3	No	Yes	No

In [10]:

from theano import tensor as tt

with monty_model:
    p_open = pm.Deterministic(
        'p_open',
        tt.switch(tt.eq(prize, 0),
            np.array([0., 0.5, 0.5]), # it is behind the first door
        tt.switch(tt.eq(prize, 1),
            np.array([0., 0., 1.]),   # it is behind the second door
            np.array([0., 1., 0.])))  # it is behind the third door
    )

Monty opened the third door, revealing a goat.

In [11]:

with monty_model:
    opened = pm.Categorical('opened', p_open, observed=2)

Should we switch our choice of door?

In [14]:

with monty_model:
    monty_trace = pm.sample(1000, **MONTY_SAMPLE_KWARGS)
    
monty_df = pm.trace_to_dataframe(monty_trace)

Multiprocess sampling (3 chains in 3 jobs)
Metropolis: [prize]
Sampling 3 chains: 100%|██████████| 4500/4500 [00:01<00:00, 3643.12draws/s]

In [15]:

monty_df['prize'].head()

Out[15]:

0    0
1    0
2    0
3    0
4    1
Name: prize, dtype: int64

In [16]:

ax = (monty_df['prize']
              .value_counts(normalize=True, ascending=True)
              .plot(kind='bar', color='C0'))

ax.set_xlabel("Door");
ax.yaxis.set_major_formatter(pct_formatter);
ax.set_ylabel("Probability of prize");

Probabilistic programming is not new¶


BUGS (1989)	JAGS (2007)

From the PyMC3 documentation:

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

Monte Carlo Methods¶

In [17]:

N = 5000

x, y = np.random.uniform(0, 1, size=(2, N))

In [19]:

fig

Out[19]:

In [20]:

in_circle = x**2 + y**2 <= 1

In [22]:

fig

Out[22]:

In [23]:

4 * in_circle.mean()

Out[23]:

3.1335999999999999

History of Monte Carlo Methods¶

Case Study: NBA Foul Calls¶

Question: Is (not) committing and/or drawing fouls a measurable player skill?

In [27]:

orig_df.head(n=2).T

Out[27]:

play_id	20150301CLEHOU-0	20150301CLEHOU-1
period	Q4	Q4
seconds_left	112	103
call_type	Foul: Shooting	Foul: Shooting
committing_player	Josh Smith	J.R. Smith
disadvantaged_player	Kevin Love	James Harden
review_decision	CNC	CC
away	CLE	CLE
home	HOU	HOU
date	2015-03-01 00:00:00	2015-03-01 00:00:00
score_away	103	103
score_home	105	105
disadvantaged_team	CLE	HOU
committing_team	HOU	CLE

Model outline¶

$$ \operatorname{log-odds}(\textrm{Foul}) \ \sim \textrm{Season factor} + \left(\textrm{Disadvantaged skill} - \textrm{Committing skill}\right) $$

In [37]:

with pm.Model() as irt_model:
    β_season = pm.Normal('β_season', 0., 2.5, shape=n_season)
    
    θ = hierarchical_normal('θ', n_player) # disadvantaged skill
    b = hierarchical_normal('b', n_player) # committing skill

    p = pm.math.sigmoid(
        β_season[season] + θ[player_disadvantaged] - b[player_committing]
    )
    
    obs = pm.Bernoulli(
        'obs', p,
        observed=df['foul_called'].values
    )

Metropolis-Hastings Inference¶

In [38]:

with irt_model: 
    step = pm.Metropolis()
    met_trace = pm.sample(5000, step=step, **SAMPLE_KWARGS)

Multiprocess sampling (3 chains in 3 jobs)
CompoundStep
>Metropolis: [σ_b]
>Metropolis: [Δ_b]
>Metropolis: [σ_θ]
>Metropolis: [Δ_θ]
>Metropolis: [β_season]
Sampling 3 chains: 100%|██████████| 16500/16500 [01:33<00:00, 176.67draws/s]
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

In [41]:

trace_fig

Out[41]:

In [42]:

max(np.max(var_stats) for var_stats in pm.gelman_rubin(met_trace).values())

Out[42]:

3.1038666861280597

The Curse of Dimensionality¶

This model has

In [43]:

n_param = n_season + 2 * n_player
n_param

Out[43]:

parameters

In [46]:

fig

Out[46]:

Hamiltonian Monte Carlo Inference¶

Bayesian inference ⇔ Differential geometry¶

Automating calculus¶

$$\frac{d}{dx} \left(x^3\right) = 3 x^2$$

In [47]:

x = tt.dscalar('x')
x.tag.test_value = 0.

y = x**3

In [48]:

pprint(tt.grad(y, x))

Out[48]:

'((fill((x ** TensorConstant{3}), TensorConstant{1.0}) * TensorConstant{3}) * (x ** (TensorConstant{3} - TensorConstant{1})))'

Case Study Continued: NBA Foul Calls¶

In [49]:

with irt_model:
    nuts_trace = pm.sample(500, **SAMPLE_KWARGS)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 3 jobs)
NUTS: [σ_b, Δ_b, σ_θ, Δ_θ, β_season]
Sampling 3 chains: 100%|██████████| 3000/3000 [02:02<00:00, 24.59draws/s]

In [50]:

nuts_gr_stats = pm.gelman_rubin(nuts_trace)
max(np.max(var_stats) for var_stats in nuts_gr_stats.values())

Out[50]:

1.0069298341016355

In [53]:

fig

Out[53]:

Basketball strategy leads to more complexity¶

In [58]:

fig

Out[58]:

Next Steps¶

PyMC3¶

Probabilistic Programming Ecosystem¶

PyMC4?¶

TensorFlow Probability

Prototype repository

Hamiltonian Monte Carlo¶

A Conceptual Introduction to Hamiltonian Monte Carlo

Thank you!¶