Q-Learning is a reinforcement learning algorithm that helps an agent learn to make the best decisions by interacting with its environment

The agent builds a Q-table that stores Q-values. Each Q-value estimates how good it is to take an action in a state, based on expected future reward

Over time, the agent updates this table from rewards and the best action available in the next state

Update Rule

Q-learning uses a temporal difference update:

  • : current state
  • : action taken
  • : next state
  • : reward received
  • : learning rate
  • : discount factor
  • : value of the best next action