Paper Link: https://arxiv.org/pdf/2603.25038
This paper is very interesting, not my comprehensive detailed overview of the paper just parts I was interested in
AirVLA builds on pi0 architecture
Policy Architecture
The architecture is the same as the original pi 0 policy architecture with adding Real Time Chunking
Physics-Aware Guidance for Action Generation
Rather than taking the output of the VLA at face value the paper introduced a “physics loss”
The physics loss is used to guide the model to better actions, seems like this is mostly to account for items held by the gripper which results in sagging
This loss doesn’t update the models’ weights but applies simultaneously during generation. The reason they provide for this is because there’s a gap between manipulation data the VLA was pre-trained and payload sensitive dynamics of flight. My theory is also that this physics loss isn’t being propagated back to the weights because you can’t really correlate weight (precisely enough at least) by visuals and they don’t want to teach the model something wrong.
General Tracking-Error Guidance
Reference trajectory
They do not tell us how to calculate the reference trajectory
(4)
are not learned, they are chosen - D represents change in axes (change in x, y, z, yaw, etc.)
- H represents future actions outputted by the model
Payload-Aware Vertical Guidance
Did not look too into this, not really interested in this at the moment
Gaussian Splat Data Pipeline
Collecting aerial manipulation demonstrations is time-consuming and costly
Their pipleine consists of reconstructing a static environment as a Gaussian Splat -> isolating gripper visuals (prevent observation bias) -> couple drone dynamics model with assets to synthesize diverse physics-feasible training objectives (cover normal navigation & recovery behaviors)
Drone Dynamics Model
: position in the world frame : velocity in the world frame : orientation as quaternion relating the body and world frames
Controls are:
: normalized thrust body angular velocity
This drone dynamics model is used to simulate the drone in the simulation
Equivalent of this dynamics model would be a physical autopilot on a real drone taking the outputs of the policy and converting them to motor outputs
Domain-Randomized Data Synthesis
In order to make sure the policy learns how to deal with not just ideal flights this step makes extra synthetic training trajectories so the policy sees messy near-failure approaches as well
Perturb the start state
For each nominal rollout, sample a new initial state
Randomize the terminal hover target
Do not always hover at exactly the same final point, define the goal as
Randomize where it exists after the gate
Force near-gate recovery behavior
Insert an extra waypoint between the start and the gate
Generate controls and simulate
They build a plausible motion plan then use drone dynamics model to simulate a physically reasonable trajectory and controls
Given the waypoint targets compute a control sequence that makes the drone follow them under the drone dynamics, and that produces synthetic rollout
: the synthetic state trajectory : the synthetic control sequence
- Build a path that goes through the sampled waypoints
- Each time, choose controls
that try to follow that path - Feed those controls into drone dynamics
- Integrated forward to get the next state
- Repeat until you get full rollout