# The HMC Revolution is Open Source¶

## Who am I?¶

### arochford@monetate.com • austin.rochford@gmail.com¶

• Founded 2008, web optimization and personalization SaaS
• Observed 5B impressions and \$4.1B in revenue during Cyber Week 2017

Source

## Probabilistic Programming¶

### The Monty Hall Problem¶

Initially, we have no information about which door the prize is behind.

In [8]:
import pymc3 as pm

with pm.Model() as monty_model:
prize = pm.DiscreteUniform('prize', 0, 2)


If we choose door one:

 Monty can open Prize behind Door 1 Door 2 Door 3 Door 1 No Yes Yes Door 2 No No Yes Door 2 No Yes No
In [9]:
from theano import tensor as tt

with monty_model:
p_open = pm.Deterministic(
'p_open',
tt.switch(tt.eq(prize, 0),
np.array([0., 0.5, 0.5]), # it is behind the first door
tt.switch(tt.eq(prize, 1),
np.array([0., 0., 1.]),   # it is behind the second door
np.array([0., 1., 0.])))  # it is behind the third door
)


Monty opened the third door, revealing a goat.

In [10]:
with monty_model:
opened = pm.Categorical('opened', p_open, observed=2)


Should we switch our choice of door?

In [13]:
with monty_model:
monty_trace = pm.sample(1000, **MONTY_SAMPLE_KWARGS)

monty_df = pm.trace_to_dataframe(monty_trace)

Multiprocess sampling (2 chains in 2 jobs)
Metropolis: [prize]
100%|██████████| 1500/1500 [00:01<00:00, 1321.68it/s]

In [14]:
monty_df['prize'].head()

Out[14]:
0    0
1    0
2    0
3    0
4    1
Name: prize, dtype: int64
In [15]:
ax = (monty_df['prize']
.value_counts(normalize=True, ascending=True)
.plot(kind='bar', color='b'))

ax.set_xlabel("Door");
ax.yaxis.set_major_formatter(pct_formatter);
ax.set_ylabel("Probability of prize");


#### Probabilistic programming is not new¶

 BUGS (1989) JAGS (2007)

From the PyMC3 documentation:

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

### Monte Carlo Methods¶

In [16]:
N = 5000

x, y = np.random.uniform(0, 1, size=(2, N))

In [18]:
fig

Out[18]:
In [19]:
in_circle = x**2 + y**2 <= 1

In [21]:
fig

Out[21]:
In [22]:
4 * in_circle.mean()

Out[22]:
3.1336

## Case Study: NBA Foul Calls¶

Question: Is (not) committing and/or drawing fouls a measurable player skill?

In [26]:
orig_df.head(n=2).T

Out[26]:
play_id 20150301CLEHOU-0 20150301CLEHOU-1
period Q4 Q4
seconds_left 112 103
call_type Foul: Shooting Foul: Shooting
committing_player Josh Smith J.R. Smith
review_decision CNC CC
away CLE CLE
home HOU HOU
date 2015-03-01 00:00:00 2015-03-01 00:00:00
score_away 103 103
score_home 105 105
committing_team HOU CLE

### Model outline¶

$$\textrm{Foul}\ \sim \textrm{Season factor} + \left(\textrm{Disadvantaged skill} - \textrm{Committing skill}\right)$$

In [36]:
with pm.Model() as irt_model:
β_season = pm.Normal('β_season', 0., 2.5, shape=n_season)

θ = hierarchical_normal('θ', n_player)
b = hierarchical_normal('b', n_player)

p = pm.math.sigmoid(
)

obs = pm.Bernoulli(
'obs', p,
observed=df['foul_called'].values
)


#### Metropolis-Hastings Inference¶

In [39]:
with irt_model:
step = pm.Metropolis()
met_trace = pm.sample(20000, step=step, **SAMPLE_KWARGS)

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>Metropolis: [σ_b_log__]
>Metropolis: [Δ_b]
>Metropolis: [σ_θ_log__]
>Metropolis: [Δ_θ]
>Metropolis: [β_season]
100%|██████████| 20500/20500 [07:51<00:00, 43.49it/s]
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

In [42]:
trace_fig

Out[42]:
In [44]:
fig

Out[44]:
In [45]:
trace_fig

Out[45]:
In [46]:
max(np.max(var_stats) for var_stats in pm.gelman_rubin(met_trace).values())

Out[46]:
2.8404570911579374

### The Curse of Dimensionality¶

This model has

In [47]:
n_param = n_season + 2 * n_player
n_param

Out[47]:
948

parameters

In [50]:
fig

Out[50]:

## Hamiltonian Monte Carlo Inference¶

### Automating calculus¶

$$\frac{d}{dx} \left(x^3\right) = 3 x^2$$

In [51]:
x = tt.dscalar('x')
x.tag.test_value = 0.

y = x**3

In [52]:
pprint(tt.grad(y, x))

Out[52]:
'((fill((x ** TensorConstant{3}), TensorConstant{1.0}) * TensorConstant{3}) * (x ** (TensorConstant{3} - TensorConstant{1})))'

### Case Study Continued: NBA Foul Calls¶

In [53]:
with irt_model:
nuts_trace = pm.sample(500, **SAMPLE_KWARGS)

Auto-assigning NUTS sampler...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [σ_b_log__, Δ_b, σ_θ_log__, Δ_θ, β_season]
100%|██████████| 1000/1000 [02:00<00:00,  8.30it/s]

In [54]:
nuts_gr_stats = pm.gelman_rubin(nuts_trace)
max(np.max(var_stats) for var_stats in nuts_gr_stats.values())

Out[54]:
1.00204597168463
In [57]:
fig

Out[57]: