Ayush Garg

Search

Recently Updated

Crazyflow: An Accurate, GPU-Accelerated Differentiable Drone Simulator in JAX
Jun 05, 2026
LocateAnything
Jun 04, 2026
Cosmos 3: Omnimodal World Models for Physical AI
Jun 04, 2026
Mixture of Transformers
Jun 03, 2026

❯

❯

LocateAnything

Jun 04, 2026, 1 min read

Paper Link: https://research.nvidia.com/labs/lpr/locate-anything/LocateAnything.pdf

LocateAnything is a generative vision-language model for localization

This paper’s main contributions are:

Early exploration of applying multi-token prediction to VLM-based detection/grounding via Parallel Box Decoding
Hybrid decoding policy that detects unreliable parallel blocks and performs localized NTP re-decoding only for the problematic block

Architecture

Graph View

Backlinks

No backlinks found

Created by Ayush Garg using Quartz , © 2026

GitHub
Linkedin
Blog
Twitter