Kullback-Leibler (KL) Divergence is a type of statistical distance, it’s a measure of how much an approximating distribution is different from a true probability distribution

In other words, it quantifies the difference between what our model believes and what’s actually true

Discrete cases:

Continuous cases: