Q-Learning

Q-Learning is a reinforcement learning algorithm that helps an agent learn to make the best decisions by interacting with its environment

The agent builds a Q-table that stores Q-values. Each Q-value estimates how good it is to take an action in a state, based on expected future reward

Over time, the agent updates this table from rewards and the best action available in the next state

Update Rule

Q-learning uses a temporal difference update: