ConceptPose: Training‑Free Zero‑Shot Object Pose Estimation Using Concept Vectors

Liming Kuang, Yordanka Velikova, Mahdi Saleh, Jan‑Nico Zaech, Danda Pani Paudel, Benjamin Busam

ConceptPose is the first training‑free and model‑free framework for estimating 6‑DoF object poses from images. The method uses off‑the‑shelf vision–language models to extract concept maps from saliency maps. These concept maps provide semantic constraints and guide a differentiable 6‑DoF optimization that does not require any training data. Experiments demonstrate large improvements over prior zero‑shot methods – ConceptPose outperforms the previous best baselines on multiple datasets and brings the ability to handle unseen objects.