How neural networks learn.

Gradient Descent is a powerful optimization algorithm used to train Machine Learning models with the objective of minimizing the Cost Function.

Cost Function

Analogy

The way you train a computer to do better is by insulting it, when training the network you mention to the computer what it should be and compare it what the computer returned and tell it to do better. I loved this analogy from the 3Blue1Brown video loool

Mathematically

You add the up the squares of the differences between the trash output activations and the value you want them to have. This is called the cost of a single training example

Eg: [(x - x) + … (x - x)]

Cost is high when the network doesn’t seem like it knows what it’s doing

Cost is small when the network classifies correctly

Average cost is the average of the costs of all the training example, it’s also a measure of how lousy the network is and how “bad” the computer should “feel”

What we mean when we say a network is learning is that we’re just minimizing the cost function