Paper Link: https://arxiv.org/abs/2412.16346

SOUS VIDE (Scene Optimized Understanding via Synthesized Visual Intertial Data from Experts) is a behavior cloning pipeline that produces drone navigation policy capable of zero-shot sim2real transfer, entirely in simulation

Flying in Gaussian Splats (FiGS)

3D Gaussian Splatting

In this paper they generate GSplats from short video recordings (2-3 mins), they walk-through with handheld camera and from the video they extract a set of training images and use the open-source tool Nerfstudio to train the GSplat model

The resulting model can generate a photorealistic image from a virtual camera at any pose covered by the training images given a camera pose (p, q) where p represents position and q the orientation in quaternion form

Drone Dynamics Model

Drone state:

  • Where it is:
  • How fast it’s moving:
  • Which way it is tilted / rotated: orientation q