
Alexander Spiridonov, Jan‑Nico Zaech, Nikolay Nikolov, Luc Van Gool, Danda Pani Paudel MotoVLA reduces the dependency of generalist robot manipulation on action‑labelled demonstrations. It enables the use of unlabelled human and robot videos to learn object manipulation skills. This is achieved by extracting dense 3D point clouds around the hand or gripper from video data…