A benchmark for the evaluation of RGB-D SLAM systems

Published on Dec 24, 2012 in IROS (Intelligent Robots and Systems)
· DOI :10.1109/IROS.2012.6385773
Jrgen Sturm1
Estimated H-index: 1
(TUM: Technische Universität München),
Nikolas Engelhard7
Estimated H-index: 7
(University of Freiburg)
+ 2 AuthorsDaniel Cremers98
Estimated H-index: 98
(TUM: Technische Universität München)
In this paper, we present a novel benchmark for the evaluation of RGB-D SLAM systems. We recorded a large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system. The sequences contain both the color and depth images in full sensor resolution (640 × 480) at video frame rate (30 Hz). The ground-truth trajectory was obtained from a motion-capture system with eight high-speed tracking cameras (100 Hz). The dataset consists of 39 sequences that were recorded in an office environment and an industrial hall. The dataset covers a large variety of scenes and camera motions. We provide sequences for debugging with slow motions as well as longer trajectories with and without loop closures. Most sequences were recorded from a handheld Kinect with unconstrained 6-DOF motions but we also provide sequences from a Kinect mounted on a Pioneer 3 robot that was manually navigated through a cluttered indoor environment. To stimulate the comparison of different approaches, we provide automatic evaluation tools both for the evaluation of drift of visual odometry systems and the global pose error of SLAM systems. The benchmark website [1] contains all data, detailed descriptions of the scenes, specifications of the data formats, sample code, and evaluation tools.
Figures & Tables
📖 Papers frequently viewed together
3,103 Citations
2014ICRA: International Conference on Robotics and Automation
1,060 Citations
2013ICRA: International Conference on Robotics and Automation
3 Authors (Christian Kerl, ..., Daniel Cremers)
418 Citations
Jan 1, 2014 in ISER (International Symposium on Experimental Robotics)
#1Peter Henry (UW: University of Washington)H-Index: 14
#2Michael Krainin (UW: University of Washington)H-Index: 10
Last. Dieter Fox (UW: University of Washington)H-Index: 117
view all 5 authors...
RGB-D cameras are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used in the context of robotics, specifically for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based ali...
719 CitationsSource
Jun 16, 2012 in CVPR (Computer Vision and Pattern Recognition)
#1Andreas Geiger (KIT: Karlsruhe Institute of Technology)H-Index: 57
#2Philip Lenz (KIT: Karlsruhe Institute of Technology)H-Index: 6
Last. Raquel Urtasun (Toyota Technological Institute at Chicago)H-Index: 92
view all 3 authors...
Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-t...
6,121 CitationsSource
May 14, 2012 in ICRA (International Conference on Robotics and Automation)
#1Felix Endres (University of Freiburg)H-Index: 9
#2Jurgen Hess (University of Freiburg)H-Index: 8
Last. Wolfram Burgard (University of Freiburg)H-Index: 117
view all 6 authors...
We present an approach to simultaneous localization and mapping (SLAM) for RGB-D cameras like the Microsoft Kinect. Our system concurrently estimates the trajectory of a hand-held Kinect and generates a dense 3D model of the environment. We present the key features of our approach and evaluate its performance thoroughly on a recently published dataset, including a large set of sequences of different scenes with varying camera speeds and illumination conditions. In particular, we evaluate the acc...
555 CitationsSource
Dec 5, 2011 in IROS (Intelligent Robots and Systems)
#1François Pomerleau (ETH Zurich)H-Index: 23
#2Stéphane Magnenat (ETH Zurich)H-Index: 24
Last. Roland Siegwart (ETH Zurich)H-Index: 111
view all 5 authors...
The increasing number of ICP variants leads to an explosion of algorithms and parameters. This renders difficult the selection of the appropriate combination for a given application. In this paper, we propose a state-of-the-art, modular, and efficient implementation of an ICP library. We took advantage of the recent availability of fast depth cameras to demonstrate one application example: a 3D pose tracker running at 30 Hz. For this application, we show the modularity of our ICP library by opti...
80 CitationsSource
Nov 1, 2011 in ICCV (International Conference on Computer Vision)
#1Frank Steinbrucker (TUM: Technische Universität München)H-Index: 4
#2Jürgen Sturm (TUM: Technische Universität München)H-Index: 32
Last. Daniel Cremers (TUM: Technische Universität München)H-Index: 98
view all 3 authors...
We present an energy-based approach to visual odometry from RGB-D images of a Microsoft Kinect camera. To this end we propose an energy function which aims at finding the best rigid body motion to map one RGB-D image into another one, assuming a static scene filmed by a moving camera. We then propose a linearization of the energy function which leads to a 6×6 normal equation for the twist coordinates representing the rigid body motion. To allow for larger motions, we solve this equation in a coa...
273 CitationsSource
Nov 1, 2011 in ICCV (International Conference on Computer Vision)
#1Jan Smisek (CTU: Czech Technical University in Prague)H-Index: 8
#2Michal Jancosek (CTU: Czech Technical University in Prague)H-Index: 8
Last. Tomas Pajdla (CTU: Czech Technical University in Prague)H-Index: 52
view all 3 authors...
We analyze Kinect as a 3D measuring device, experimentally investigate depth measurement resolution and error properties and make a quantitative comparison of Kinect accuracy with stereo reconstruction from SLR cameras and a 3D-TOF camera. We propose Kinect geometrical model and its calibration procedure providing an accurate calibration of Kinect 3D measurement and Kinect cameras. We demonstrate the functionality of Kinect calibration by integrating it into an SfM pipeline where 3D measurements...
290 CitationsSource
#1Richard Newcombe (Imperial College London)H-Index: 19
#2Shahram Izadi (Microsoft)H-Index: 70
Last. Andrew Fitzgibbon (Microsoft)H-Index: 72
view all 10 authors...
We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (I...
2,892 CitationsSource
Jun 20, 2011 in CVPR (Computer Vision and Pattern Recognition)
#1Sid Yingze Bao (UM: University of Michigan)H-Index: 7
#2Silvio Savarese (UM: University of Michigan)H-Index: 83
Conventional rigid structure from motion (SFM) addresses the problem of recovering the camera parameters (motion) and the 3D locations (structure) of scene points, given observed 2D image feature points. In this paper, we propose a new formulation called Semantic Structure From Motion (SSFM). In addition to the geometrical constraints provided by SFM, SSFM takes advantage of both semantic and geometrical properties associated with objects in the scene (Fig. 1). These properties allow us to recov...
125 CitationsSource
May 9, 2011 in ICRA (International Conference on Robotics and Automation)
#1Rainer Kümmerle (University of Freiburg)H-Index: 19
#2Giorgio Grisetti (University of Freiburg)H-Index: 36
Last. Wolfram Burgard (University of Freiburg)H-Index: 117
view all 5 authors...
Many popular problems in robotics and computer vision including various types of simultaneous localization and mapping (SLAM) or bundle adjustment (BA) can be phrased as least squares optimization of an error function that can be represented by a graph. This paper describes the general structure of such problems and presents g2o, an open-source C++ framework for optimizing graph-based nonlinear error functions. Our system has been designed to be easily extensible to a wide range of problems and ...
1,511 CitationsSource
#1Simon Baker (Microsoft)H-Index: 70
#2Daniel Scharstein (Middlebury College)H-Index: 27
Last. Richard Szeliski (Microsoft)H-Index: 112
view all 6 authors...
The quantitative evaluation of optical flow algorithms by Barron et al. (1994) led to significant advances in performance. The challenges for optical flow algorithms today go beyond the datasets and evaluation methods proposed in that paper. Instead, they center on problems associated with complex natural scenes, including nonrigid motion, real sensor noise, and motion discontinuities. We propose a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. T...
1,753 CitationsSource
Cited By1804
#1Fan YingchunH-Index: 2
#2Qichi ZhangH-Index: 1
Last. Hong Han (Xidian University)H-Index: 3
view all 5 authors...
view all 3 authors...
#1Peng Jin (BIT: Beijing Institute of Technology)H-Index: 4
#2Shaoli Liu (BIT: Beijing Institute of Technology)H-Index: 7
Last. Reinhard Klein (University of Bonn)H-Index: 49
view all 7 authors...
In recent years, addressing ill-posed problems by leveraging prior knowledge contained in databases on learning techniques has gained much attention. In this paper, we focus on complete three-dimensional (3D) point cloud reconstruction based on a single red-green-blue (RGB) image, a task that cannot be approached using classical reconstruction techniques. For this purpose, we used an encoder-decoder framework to encode the RGB information in latent space, and to predict the 3D structure of the c...
#1Mert Gurturk (YTU: Yıldız Technical University)
#2Abdullah YusefiH-Index: 1
Last. Andrea Masiero (UniFI: University of Florence)H-Index: 1
view all 6 authors...
Abstract null null Visual Simultaneous Localization and Mapping (VSLAM) and Visual Odometry (VO) are fundamental problems to be properly tackled for enabling autonomous and effective movements of vehicles/robots supported by vision-based positioning systems. This study presents a publicly shared dataset for SLAM investigations: a dataset collected at the Yildiz Technical University (YTU) in an outdoor area by an acquisition system mounted on a terrestrial vehicle. The acquisition system includes...
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps. These depth sensors, however, suffer from stereo artefacts and do not provide dense depth estimates. In this work, we present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps. Our system leverages a feature-based visual inertial SLAM system to produce motion estimates and accurate (but sparse) 3D landmarks. The 3D lan...
In this paper, we revisit the problem of local optimization in RANSAC. Once a so-far-the-best model has been found, we refine it via Dual Principal Component Pursuit (DPCP), a robust subspace learning method with strong theoretical support and efficient algorithms. The proposed DPCP-RANSAC has far fewer parameters than existing methods and is scalable. Experiments on estimating two-view homographies, fundamental and essential matrices, and three-view homographic tensors using large-scale dataset...
In recent years, visual SLAM has achieved great progress and development, but in complex scenes, especially rotating scenes, the error of mapping will increase significantly, and the slam system is easy to lose track. In this article, we propose an InterpolationSLAM framework, which is a visual SLAM framework based on ORB-SLAM2. InterpolationSLAM is robust in rotating scenes for Monocular and RGB-D configurations. By detecting the rotation and performing interpolation processing at the rotated p...
#1Burhan Ölmez (METU: Middle East Technical University)
#2T.E. Tuncer (METU: Middle East Technical University)H-Index: 8
Abstract null null In this paper, a novel approach is presented to estimate the metric scale (MSC) and roll and pitch angles of a platform by using distance sensors in a monocular visual odometry setup. A state-of-the-art visual odometry algorithm Semi-Direct Visual Odometry (SVO) null [1] null is used to obtain sparse three dimensional (3D) point cloud which is then matched with the measurements obtained from the distance sensors for the estimation process. Metric scale with Kalman (MSCwK) filt...
#1Yuki Fujimura (Kyoto University)H-Index: 2
#2Motoharu Sonogashira (Kyoto University)H-Index: 2
Last. Masaaki Iiyama (Kyoto University)H-Index: 7
view all 3 authors...
Abstract null null We propose a learning-based multi-view stereo (MVS) method in scattering media, such as fog or smoke, with a novel cost volume, called the dehazing cost volume. Images captured in scattering media are degraded due to light scattering and attenuation caused by suspended particles. This degradation depends on scene depth; thus, it is difficult for traditional MVS methods to evaluate photometric consistency because the depth is unknown before three-dimensional (3D) reconstruction...
Oct 1, 2021 in ICRA (International Conference on Robotics and Automation)
#1Huaiyang Huang (HKUST: Hong Kong University of Science and Technology)H-Index: 6
#2Yuxiang Sun (PolyU: Hong Kong Polytechnic University)H-Index: 13
Last. Ming Liu (HKUST: Hong Kong University of Science and Technology)H-Index: 52
view all 8 authors...