Behavior cloning loss is the supervised-learning objective used to train a policy to imitate expert demonstrations
Given a dataset of expert demonstrations:
behavior cloning trains a policy
Discrete Actions
For discrete actions, BC loss is usually cross entropy / negative log likelihood:
This is the same objective as training a classifier to put high probability on the expert action.
Continuous Actions
For continuous actions, BC loss is often mean squared error between the predicted action and the expert action:
= action predicted by the learned policy = expert action = state or observation
Intuition
Behavior cloning treats imitation learning as supervised learning:
- input: state / observation
- label: expert action
- model: policy
- loss: how different the policy’s action is from the expert’s action
Limitation
BC only learns from the states seen in the expert dataset. If the policy makes a mistake and enters an unfamiliar state, errors can compound because the model may not know how to recover.
This is called distribution shift or covariate shift.