
Asen Nachkov, Jan‑Nico Zaech, Danda Pani Paudel, Xi Wang, Luc Van Gool This work introduces Differentiable Simulation for Search (DSS) to address the challenge of planning safe and efficient trajectories for autonomous vehicles. DSS uses a differentiable simulator (Waymax) as both a dynamics model and a critic, enabling gradient‑based search over action sequences. Unlike imitation‑learning…

Liming Kuang, Yordanka Velikova, Mahdi Saleh, Jan‑Nico Zaech, Danda Pani Paudel, Benjamin Busam ConceptPose is the first training‑free and model‑free framework for estimating 6‑DoF object poses from images. The method uses off‑the‑shelf vision–language models to extract concept maps from saliency maps. These concept maps provide semantic constraints and guide a differentiable 6‑DoF optimization that does…

Yang Miao, Jan‑Nico Zaech, Xi Wang, Fabien Despinoy, Danda Pani Paudel, Luc Van Gool LangHOPS introduces a multimodal framework for open‑vocabulary object‑part segmentation. The system models hierarchical part relationships using structured prompts processed with large language models. The approach improves zero‑shot part segmentation performance on PartImageNet and ADE20K benchmarks and achieves better generalization to previously…

Nikolay Nikolov, Giuliano Albanese, Sombit Dey, Aleksandar Yanev, Luc Van Gool, Jan-Nico Zaech, Danda Pani Paudel SPEAR‑1 addresses limitations of robot imitation learning by fusing 3D perception with language‑based control. SPEAR-1 introduces a 3D‑aware vision–language model (SPEAR‑VLM) that jointly reasons about 3D scene geometry and human language instructions. This model powers a Vision‑Language-Action Model that…

Alexander Spiridonov, Jan‑Nico Zaech, Nikolay Nikolov, Luc Van Gool, Danda Pani Paudel MotoVLA reduces the dependency of generalist robot manipulation on action‑labelled demonstrations. It enables the use of unlabelled human and robot videos to learn object manipulation skills. This is achieved by extracting dense 3D point clouds around the hand or gripper from video data…

Anna-Maria Halacheva, Jan-Nico Zaech, Xi Wang, Danda Pani Paudel, Luc Van Gool We present GaussianVLM, the first 3D VLM operating on Gaussian splats. Each Gaussian in the scene is enriched with language features, forming a dense, scene-centric representation. A novel dual sparsifier reduces ~40k language-augmented Gaussians to just 132 tokens, retaining task-relevant and location-relevant information.…

Sombit Dey, Jan-Nico Zaech, Nikolay Nikolov, Luc Van Gool, Danda Pani Paudel International Conference on Robotics and Automation, ICRA 2025 Abstract Recent progress in large language models and access to large-scale robotic datasets has sparked a paradigm shift in robotics models transforming them into generalists able to adapt to various tasks, scenes, and robot modalities.…

Jiahuan Cheng, Jan-Nico Zaech, Luc Van Gool, Danda Pani Paudel Gaussian Splatting is a widely adopted approach for 3D scene representation, offering efficient, high-quality reconstruction and rendering. A key reason for its success is the simplicity of representing scenes with sets of Gaussians, making it interpretable and adaptable. To enhance understanding beyond visual representation, recent…

International Conference on Computer Vision, ICCV 2025 Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech, Xi Wang, Luc Van Gool, Danda Pani Paudel 3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI. Providing a solution to these applications requires a multifaceted approach that…

Jan-Nico Zaech, Martin Danelljan, Tolga Birdal, Luc Van Gool IEEE Conference on Computer Vision and Pattern Recognition 2024 (CVPR) Abstract Adiabatic quantum computing (AQC) is a promising approach for discrete and often NP-hard optimization problems. Current AQCs allow to implement problems of research interest, which has sparked the development of quantum representations for many computer…