Paper Link: https://arxiv.org/abs/2412.16346
SOUS VIDE (Scene Optimized Understanding via Synthesized Visual Intertial Data from Experts) is a behavior cloning pipeline that produces drone navigation policy capable of zero-shot sim2real transfer, entirely in simulation
Flying in Gaussian Splats (FiGS)
In this paper they generate GSplats from short video recordings (2-3 mins), they walk-through with handheld camera and from the video they extract a set of training images and use the open-source tool Nerfstudio to train the GSplat model
The resulting model can generate a photorealistic image from a virtual camera at any pose covered by the training images given a camera pose (p, q) where p represents position and q the orientation in quaternion form
Drone Dynamics Model
Drone state:
- Where it is:
- How fast it’s moving:
- Which way it is tilted / rotated: orientation q