My analysis of the SINGER An onboard generalist Vision-Language Navigation Policy for Drones paper code

SINGER builds on top of SousVide which builds on top of FiGS which builds on top of Acadoa

so fun times

The code is also very interesting, the README seems to be outdated / referencing the wrong repository

Semantic Scene Generation

SINGER claims that each scene has semantic representation so its searchable with user query. Purpose of this is to use RRT* from the target object (described by query) and generate trajectories to free spaces around the target to synthetically generate data.

The details of how they made it so each scene has semantic representation is not conveyed very well in the paper and the codebase runs you in circles until it references a method thats not in the code