SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding

Nikolay Nikolov, Giuliano Albanese, Sombit Dey, Aleksandar Yanev, Luc Van Gool, Jan-Nico Zaech, Danda Pani Paudel

SPEAR‑1 addresses limitations of robot imitation learning by fusing 3D perception with language‑based control. SPEAR-1 introduces a 3D‑aware vision–language model (SPEAR‑VLM) that jointly reasons about 3D scene geometry and human language instructions. This model powers a Vision‑Language-Action Model that can solve household manipulation and navigation tasks using 20 × fewer demonstrations than current state-of-the-art models. The system outperforms models such as π₀‑FAST, demonstrating that 3D‑aware language grounding helps robots generalize better to new objects and scenes

More information on our website https://spear.insait.ai/