RA-BC Weights

RA-BC stands for Reward-Aligned Behavior Cloning.

Normal behavior cloning treats every training example the same, but RA-BC assigns a weight to each example based on how good that action or action chunk is according to a reward model.

One way to write the objective is:

is the RA-BC weight for example
larger means that example has more influence during training
examples with better reward get larger weights

Intuition

good demonstrations should matter more than weak demonstrations
the policy should copy actions that actually move the task forward
this helps when the dataset has mixed-quality trajectories

In SARM

In SARM, the reward model predicts stage-aware rewards for long-horizon robot manipulation.

Those predicted rewards are then turned into RA-BC weights, so the policy focuses more on action chunks that better align with task progress.

Ayush Garg

Recently Updated

Pareto Principle

Bits

Magnitude of a normalized floating-point number

Mixed Precision Training

RA-BC Weights

Intuition

In SARM

Graph View

Table of Contents

Backlinks

Ayush Garg

Recently Updated

Pareto Principle

Bits

Magnitude of a normalized floating-point number

Mixed Precision Training

RA-BC Weights

Intuition §

In SARM §

Graph View

Table of Contents

Backlinks

Intuition

In SARM