Running VLAs at Real-time Speed

Paper Link: https://arxiv.org/abs/2510.26742v1

Achieved latency of 27.3 ms given 2 input views

Eliminating the CPU overhead

Neural network inference is driven by Python code that launches the UCDA kernels. Python part has significant overhead when the number of kernels is large (1000+ in pi 0)

There are several Ahead-Of-Time (AOT) or Just in Time Compilation techniques available

Ayush Garg

Recently Updated

TensorRT

CUDA Graphs

Running VLAs at Real-time Speed

pi 0

Running VLAs at Real-time Speed

Eliminating the CPU overhead

Graph View

Backlinks

Ayush Garg

Recently Updated

TensorRT

CUDA Graphs

Running VLAs at Real-time Speed

pi 0

Running VLAs at Real-time Speed

Eliminating the CPU overhead §

Graph View

Backlinks

Eliminating the CPU overhead