I had discovered Cam Davidson-Pilon’s book, “Probabilistic Programming and Bayesian Methods for Hackers” a year ago, but never got around to finishing it until today. If you’ve been holding out for a third party opinion, hopefully this blog post will scratch that itch for you.
It’s not your average bedtime read, but because it seeks to help you understand the intuition of Bayesian methods before drilling in the math, it’s fairly digestible. And even if you hadn’t had too much exposure to statistics, the visualizations are sufficient to give you a sense of what’s going on.
The book is organized into six chapters - the first two are introductions to a couple of important probability distributions and the benefits of probabilistic programming with PyMC3, which is a tool that uses Markov chain Monte Carlo and other fitting algorithms for optimization. In the second chapter, you’ll learn about the privacy algorithm, which allows noise (which you can model into the final prediction) to be introduced to an algorithm for detecting students cheating on an exam. Interesting stuff.
Chapter three had a cool visual representation of how observations of new data manipulated the surface of a Bayesian prior. The book described the effects of novel data as a “pulling and stretching [of] the fabric of the prior surface”. I thought this was a really nice mental viz, at least while it was feasible in two and three dimensions. There was another nice mental picture in the book to describe the traces that were being created by the Markov chain Monte Carlo process. A trace was described as the sequence of pebbles that came from a particular mountain with a particular height. One could glean information about the mountain that the pebble had come from, and higher the mountain peak, the higher the posterior probability. It was the goal of the sampling algorithm (such as Metropolis-Hastings) to approximate the entire distribution using the information contained in the collected pebbles.
Chapter four explained the Law of Large Numbers, which I think Cam also touched on in his PyData 2015 talk. The Law of Large Numbers basically states that the average of a sequence of random pickings from a distribution converges to the expected value of the entire distribution. From this chapter I learned that you cannot use this law on small datasets, because as the sample size decreases, the Law of Large Numbers becomes more unstable. In this chapter I also learned that when sorting posterior distributions, choosing the mean is a bad idea because it ignores the uncertainty of the distributions. The book taught me to choose the 95% least plausible value to preserve uncertainty.
Chapters five and six were good natural next steps to push the reader out of the introductory phase and into more practical application. Chapter five touched on loss functions, introducing the Bayesian point estimate to calculated the expected loss for a given choice of posterior. I especially loved how simple it was to set up a Bayesian linear regression with just seven lines of PyMC3 code. The chapter also shows you how to solve the showcase problem on The Price is Right using Bayesian methods.
Chapter six, which touched on the importance of choosing a good prior, left me wanting to learn more. Especially with such a lucid explanation of a Bayesian solution to the Multi-Armed Bandits problem, which is a problem where you are faced with N different slot machines (bandits) and you need to maximize your winnings by pulling one slot machine per round, in as few rounds as possible. This is a problem that I’d like to expand on more for a different application, hopefully will have a blog post up about this in the future.
Overall, “Probabilistic Programming and Bayesian Methods for Hackers” was a joy to work through and I would recommend it to anyone who is interested in learning about any or all of the topics I touched upon in this blog post. I recommend it even more to people who have heard of probabilistic programming but had never got around to actually applying it to a particular problem yet.Things I Liked:
- the visual representation of the “pulling” and “stretching” of the priors as new data comes in.
- the introduction of probability distributions on an as-needed basis allows you to brush up when necessary, thus not drowning you in trying to understand things without an immediate application.
- the example of using the 95% least plausible for preserving the uncertainty of a particular result.
- the focus on teaching the intuitive understanding of how Bayesian methods worked before diving into the math.
- that there were some more exercises I could work through to practice PyMC3 as I learned it (though I am aware that there are lots of tutorials for this elsewhere).