Michael Choi

Reservoir sampling and Gumbel max trick in Python

Jupyter notebook is here!

Recently I read from Twitter about reservoir sampling and the Gumbel max trick. This is my very own attempt to reproduce some of the basic results from scratch.

Formal reference:

Lost Relatives of the Gumbel Trick (ICML 2017) Github

A* Sampling (NIPS 2014)

Perturbation, Optimization and Statistics (NIPS 2013 workshop)

Blog reference:

Estimating means in a finite universe

Gumbel Machinery

Algorithms Every Data Scientist Should Know: Reservoir Sampling

Priority Sampling

The Gumbel-Max Trick for Discrete Distributions

Priority Sampling…