RA-BC stands for Reward-Aligned Behavior Cloning.

Normal behavior cloning treats every training example the same, but RA-BC assigns a weight to each example based on how good that action or action chunk is according to a reward model.

One way to write the objective is:

  • is the RA-BC weight for example
  • larger means that example has more influence during training
  • examples with better reward get larger weights

Intuition

  • good demonstrations should matter more than weak demonstrations
  • the policy should copy actions that actually move the task forward
  • this helps when the dataset has mixed-quality trajectories

In SARM

In SARM, the reward model predicts stage-aware rewards for long-horizon robot manipulation.

Those predicted rewards are then turned into RA-BC weights, so the policy focuses more on action chunks that better align with task progress.