Talks

The Hamiltonian Monte Carlo Revolution is Open Source: Probabilistic Programming with PyMC3

Open Data Science Conference West • November 2, 2018 • SlidesJupyter Notebook

Open Data Science Conference East • May 3, 2018 • Jupyter Notebook

In the last ten years, there have been a number of advancements in the study of Hamiltonian Monte Carlo and variational inference algorithms that have enabled effective Bayesian statistical computation for much more complicated models than were previously feasible. These algorithmic advancements have been accompanied by a number of open source probabilistic programming packages that make them accessible to the general engineering, statistics, and data science communities. PyMC3 is one such package written in Python and supported by NumFOCUS. This talk will give an introduction to probabilistic programming with PyMC3, with a particular emphasis on the how open source probabilistic programming makes Bayesian inference algorithms near the frontier of academic research accessible to a wide audience.

Two Years of Bayesian Bandits for E-Commerce

PyData NYC 2018 • October 18, 2018 • SlidesJupyter Notebook

Tom Tom Founders Festival Applied Machine Learning Conference • April 12, 2018 • SlidesJupyter Notebook

Data Philly • April 2, 2018 • SlidesJupyter Notebook

Boston Bayesians • January 22, 2018 • SlidesJupyter Notebook

Abstract: At Monetate, we’ve deployed Bayesian bandits (both noncontextual and contextual) to help our clients optimize their e-commerce sites since early 2016. This talk is an overview of the lessons we’ve learned from both the processes of deploying real-time Bayesian machine learning systems at scale and building a data product on top of these systems that is accessible to non-technical users (marketers). This talk will cover:

  • The place of multi-armed bandits in the A/B testing industry,
  • Thompson sampling and the basic theory of Bayesian bandits,
  • Bayesian approaches for accommodating nonstationarity in bandit feedback,
  • User experience challenges in driving adoption of these technologies by nontechnical marketers.

We will focus primarily on noncontextual bandits and give a brief overview of these problems in the contextual setting as time permits.

PyData NYC • November 27, 2017 • SlidesJupyter Notebook

Abstract: Since 2015, the NBA has released a detailed report of foul calls and non-calls that occur in the final two minutes of close games. This talk is a case study in using open source Python packages to analyze these reports in order to understand the relationship between game dynamics, player abilities, and foul calls. Our main goal is to quantify the relationship between player ability and foul calls. Since intentional fouls are a ubiquitous part of the NBA endgame, this data set also contains rich information about the relationship between game dynamics and intentional fouls for us to model.

Open Source Bayesian Inference in Python with PyMC3

FOSSCON • August 25, 2017 • SlidesJupyter Notebook

Abstract: In the last ten years, there have been a number of advancements in the study of Hamiltonian Monte Carlo algorithms that have enabled effective Bayesian statistical computation for much more complicated models than were previously feasible. These algorithmic advancements have been accompanied by a number of open source probabilistic programming packages that make them accessible to programmers and statisticians. PyMC3 is one such package written in Python and supported by NumFOCUS. This workshop will give an introduction to probabilistic programming with PyMC3. No preexisting knowledge of Bayesian statistics is necessary; a working knowledge of Python will be helpful.

Empowering Marketers with Bionic AI

phlAI • August 15, 2017 • With David Brussin, Founder & Chief Product Officer, Monetate

Abstract: Marketers have for many years worked to use data to improve the business outcomes from the experiences they deliver. Statistical discipline, and then AI, have markedly improved the ability to drive these improvements. As we have entered what Forrester calls ‘the age of the customer,’ customer expectations have in some ways begun to exceed competitive pressures in marketing, leading to a desire to align business outcomes more directly with customer outcomes. In this talk, we will focus on the use of AI in empowering marketers to provide each of their individual customers with better experiences. AI has been previously used to automate actions taken by humans, often enabling new scale. Solutions that replace human creative input altogether are frequently imagined, but hardly imminent. We will survey, from marketer, customer, and data scientist perspectives, this progression in marketing, resulting in new ‘bionic’ techniques that combine marketer creativity with machine-driven scale.

Probabilistic Programming in Python with PyMC3

ODSC East 2017 • May 5, 2017 • SlidesJupyter Notebook

Abstract: Probabilistic programming is a paradigm in which the programmer specifies a generative probability model for observed data and the language/software library infers the distributions of unobserved quantities. By separating model specification from inference, probabilistic programming allows the modeler to “tell the story” of how the data were generated and then perform inference without explicitly developing an inference algorithm. This separation makes inference more accessible for many complex models. PyMC3 is a Python package for probabilistic programming built on top of Theano that provides advanced sampling and variational inference algorithms and is undergoing rapid development. This talk will give an introduction to probabilistic programming using PyMC3 and will conclude with a brief overview of the wider probabilistic programming ecosystem.

Variational Inference in Python

PyData DC 2016 • October 8, 2016 • SlidesJupyter Notebook

Abstract: Bayesian inference has proven to be a valuable approach to many machine learning problems. Unfortunately, many interesting Bayesian models do not have closed-form posterior distributions. Simulation via the family Markov chain Monte Carlo (MCMC) algorithms is the most popular method of approximating the posterior distribution for these analytically intractible models. While these algorithms (appropriately used) are guaranteed to converge to the posterior given sufficient time, they are often difficult to scale to large data sets and hard to parallelize. Variational inference is an alternative approach that approximates intractible posteriors through optimization instead of simulation. By restricting the class of approximating distributions, variational inference allows control of the tradeoff between computational complexity and accuracy of the approximation. Variational inference can also take advantage of stochastic and parallel optimization algorithms to scale to large data sets. One drawback of variational inference is that in its most basic form, it can require a lot of model-specific manual calculations. Recent mathematical advances in black box variational inference (BBVI) and automatic differentiation variational inference (ADVI) along with advances in open source computational frameworks such as Theano and TensorFlow have made variational inference more accessible to non-specialists. This talk will begin with an introduction to variational inference, BBVI, and ADVI, then illustrate some of the software packages (PyMC3 and Edward) that make these variational inference algorithms available to Python developers.

An Introduction to Probabilistic Programming

DataPhilly Meetup • July 13, 2016 • SlidesJupyter Notebook

Abstract: Probabilistic programming is a paradigm in which the programmer specifies a generative probability model for observed data and the language/software library infers the (approximate) values/distributions of unobserved parameters. By separating the task of model specification from inference, probabilistic programming allows the modeler to “tell the story” of how the data were generated without explicitly developing an inference algorithm. This separation makes inference in many complex models more accessible.

This talk will give an introduction to probabilistic programming in Python using pymc3 and will also give a brief overview of the wider probabilistic programming ecosystem.

Bayesian Optimization with Gaussian Processes

DataPhilly Meetup • February 18, 2016 • SlidesJupyter Notebook

Abstract: Bayesian optimization is a technique for finding the extrema of functions which are expensive, difficult, or time-consuming to evaluate. It has many applications to optimizing the hyperparameters of machine learning models, optimizing the inputs to real-world experiments and processes, etc. This talk will introduce the Gaussian process approach to Bayesian optimization, with sample code in Python.