Gumbel Softmax

Categorical Reparameterization with Gumbel-Softmax (create categorical variables in neural networks)

  • Background:
    • Discrete variables are important in neural networks. E.g. discrete variables have been used to learn probabilistic latent representations that correspond to distinct semantic classes.
  • Contributions
    • Gumbel-Softmax, a continuous distribution on the simplex that can approximate categorical samples.
    • this paper provides a simple, differentiable approximate sampling mechanism for categorical variables that can be integrated into neural networks and trained using standard back-propagation.

The Gumbel-Softmax Distribution

For a more comprehensive explanation of why the Gumbel-Softmax distribution can approximate the categorical distribution, I would like to refer to this website.

Gumble Softmax Estimator

For learning ,there is a tradeoff between small temperatures, where samples are close to one-hot but the variance of the gradients is large, and large temperatures, where samples are smooth but the variance of the gradient is small.

In practice, we start at a high temperature and anneal to a small but non-zero temperature.

Incorporates Noise: Gumbel-Softmax incorporates Gumbel noise into the input, which allows the model to explore a variety of outputs, making it stochastic as opposed to the deterministic nature of Softmax.