Paper Link: https://arxiv.org/pdf/2603.25038

This paper is very interesting, not my comprehensive detailed overview of the paper just parts I was interested in

AirVLA builds on pi0 architecture

Policy Architecture

The architecture is the same as the original pi 0 policy architecture with adding Real Time Chunking

Physics-Aware Guidance for Action Generation

Rather than taking the output of the VLA at face value the paper introduced a “physics loss”

The physics loss is used to guide the model to better actions, seems like this is mostly to account for items held by the gripper which results in sagging

This loss doesn’t update the models’ weights but applies simultaneously during generation. The reason they provide for this is because there’s a gap between manipulation data the VLA was pre-trained and payload sensitive dynamics of flight. My theory is also that this physics loss isn’t being propagated back to the weights because you can’t really correlate weight (precisely enough at least) by visuals and they don’t want to teach the model something wrong.

General Tracking-Error Guidance

Reference trajectory

They do not tell us how to calculate the reference trajectory

(4)

  • are not learned, they are chosen
  • D represents change in axes (change in x, y, z, yaw, etc.)
  • H represents future actions outputted by the model

Payload-Aware Vertical Guidance

Did not look too into this, not really interested in this at the moment

Gaussian Splat Data Pipeline

3D Gaussian Splatting

Collecting aerial manipulation demonstrations is time-consuming and costly

Their pipleine consists of reconstructing a static environment as a Gaussian Splat -> isolating gripper visuals (prevent observation bias) -> couple drone dynamics model with assets to synthesize diverse physics-feasible training objectives (cover normal navigation & recovery behaviors)

Drone Dynamics Model

  • : position in the world frame
  • : velocity in the world frame
  • : orientation as quaternion relating the body and world frames

Controls are:

  • : normalized thrust
  • body angular velocity

This drone dynamics model is used to simulate the drone in the simulation

Equivalent of this dynamics model would be a physical autopilot on a real drone taking the outputs of the policy and converting them to motor outputs

Domain-Randomized Data Synthesis

In order to make sure the policy learns how to deal with not just ideal flights this step makes extra synthetic training trajectories so the policy sees messy near-failure approaches as well

Perturb the start state

For each nominal rollout, sample a new initial state

Randomize the terminal hover target

Do not always hover at exactly the same final point, define the goal as

Randomize where it exists after the gate

Force near-gate recovery behavior

Insert an extra waypoint between the start and the gate

Generate controls and simulate

They build a plausible motion plan then use drone dynamics model to simulate a physically reasonable trajectory and controls

Given the waypoint targets compute a control sequence that makes the drone follow them under the drone dynamics, and that produces synthetic rollout

  • : the synthetic state trajectory
  • : the synthetic control sequence
  1. Build a path that goes through the sampled waypoints
  2. Each time, choose controls that try to follow that path
  3. Feed those controls into drone dynamics
  4. Integrated forward to get the next state
  5. Repeat until you get full rollout