For each stage SARM looks across the whole training dataset and estimates what fraction of an episode’s total duration that stage usually takes (on average)
That average fraction is
This is a global statistic computed from many demonstrations of the same task
eg:
- “Reach Shirt” usually takes
- “Grasp Shirt” usually takes
- “Fold Shirt” usually takes
Then for a frame in stage 2 (“fold shirt”) with
The normalized progress is 0.61 in