Q-Learning is a reinforcement learning algorithm that helps an agent learn to make the best decisions by interacting with its environment
The agent builds a Q-table that stores Q-values. Each Q-value estimates how good it is to take an action in a state, based on expected future reward
Over time, the agent updates this table from rewards and the best action available in the next state
Update Rule
Q-learning uses a temporal difference update:
: current state : action taken : next state : reward received : learning rate : discount factor : value of the best next action