Evaluation of local detectors and descriptors for fast feature matching

Published on Nov 1, 2012 in ICPR (International Conference on Pattern Recognition)
Ondrej Miksik17
Estimated H-index: 17
Krystian Mikolajczyk47
Estimated H-index: 47
Local feature detectors and descriptors are widely used in many computer vision applications and various methods have been proposed during the past decade. There have been a number of evaluations focused on various aspects of local features, matching accuracy in particular, however there has been no comparisons considering the accuracy and speed trade-offs of recent extractors such as BRIEF, BRISK, ORB, MRRID, MROGH and LIOP. This paper provides a performance evaluation of recent feature detectors and compares their matching precision and speed in randomized kd-trees setup as well as an evaluation of binary descriptors with efficient computation of Hamming distance.
📖 Papers frequently viewed together
2010ECCV: European Conference on Computer Vision
4 Authors (Michael Calonder, ..., Pascal Fua)
2,395 Citations
10.6k Citations
2011ICCV: International Conference on Computer Vision
4 Authors (Ethan Rublee, ..., Gary Bradski)
4,935 Citations
#1Bin Fan (CAS: Chinese Academy of Sciences)H-Index: 25
#2Fuchao Wu (CAS: Chinese Academy of Sciences)H-Index: 21
Last. Zhanyi Hu (CAS: Chinese Academy of Sciences)H-Index: 27
view all 3 authors...
This paper proposes a novel method for interest region description which pools local features based on their intensity orders in multiple support regions. Pooling by intensity orders is not only invariant to rotation and monotonic intensity changes, but also encodes ordinal information into a descriptor. Two kinds of local features are used in this paper, one based on gradients and the other on intensities; hence, two descriptors are obtained: the Multisupport Region Order-Based Gradient Histogr...
151 CitationsSource
Nov 6, 2011 in ICCV (International Conference on Computer Vision)
#1Zhenhua Wang (CAS: Chinese Academy of Sciences)H-Index: 16
#2Bin Fan (CAS: Chinese Academy of Sciences)H-Index: 25
Last. Fuchao Wu (CAS: Chinese Academy of Sciences)H-Index: 21
view all 3 authors...
This paper presents a novel method for feature description based on intensity order. Specifically, a Local Intensity Order Pattern(LIOP) is proposed to encode the local ordinal information of each pixel and the overall ordinal information is used to divide the local patch into subregions which are used for accumulating the LIOPs respectively. Therefore, both local and overall intensity ordinal information of the local patch are captured by the proposed LIOP descriptor so as to make it a highly d...
265 CitationsSource
Nov 6, 2011 in ICCV (International Conference on Computer Vision)
#1Ethan Rublee (Willow Garage)H-Index: 3
#2Vincent Rabaud (Willow Garage)H-Index: 9
Last. Gary Bradski (Willow Garage)H-Index: 34
view all 4 authors...
Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several re...
4,935 CitationsSource
Nov 6, 2011 in ICCV (International Conference on Computer Vision)
#1Stefan Leutenegger (ETH Zurich)H-Index: 32
#2Margarita Chli (ETH Zurich)H-Index: 23
Last. Roland Siegwart (ETH Zurich)H-Index: 111
view all 3 authors...
Effective and efficient generation of keypoints from an image is a well-studied problem in the literature and forms the basis of numerous Computer Vision applications. Established leaders in the field are the SIFT and SURF algorithms which exhibit great performance under a variety of image transformations, with SURF in particular considered as the most computationally efficient amongst the high-performance methods to date. In this paper we propose BRISK1, a novel method for keypoint detection, d...
2,146 CitationsSource
Sep 5, 2010 in ECCV (European Conference on Computer Vision)
#1Michael Calonder (EPFL: École Polytechnique Fédérale de Lausanne)H-Index: 7
#2Vincent Lepetit (EPFL: École Polytechnique Fédérale de Lausanne)H-Index: 70
Last. Pascal Fua (EPFL: École Polytechnique Fédérale de Lausanne)H-Index: 100
view all 4 authors...
We propose to use binary strings as an efficient feature point descriptor, which we call BRIEF. We show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and U-SURF on st...
2,395 CitationsSource
Jun 13, 2010 in CVPR (Computer Vision and Pattern Recognition)
#1Jun Wang (Columbia University)H-Index: 210
#2Sanjiv Kumar (Google)H-Index: 51
Last. Shih-Fu Chang (Columbia University)H-Index: 118
view all 3 authors...
Large scale image search has recently attracted considerable attention due to easy availability of huge amounts of data. Several hashing methods have been proposed to allow approximate but highly efficient search. Unsupervised hashing methods show good performance with metric distances but, in image search, semantic similarity is usually given in terms of labeled pairs of images. There exist supervised hashing methods that can handle such semantic similarity but they are prone to overfitting whe...
556 CitationsSource
Jan 1, 2009 in VISAPP (International Conference on Computer Vision Theory and Applications)
#1Marius Muja (UBC: University of British Columbia)H-Index: 10
#2David G. Lowe (UBC: University of British Columbia)H-Index: 54
For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with only minimal guidance on selecting an algorithm and its parameters for any given problem. In this paper, w...
2,160 Citations
#1Herbert Bay (ETH Zurich)H-Index: 11
#2Andreas Ess (ETH Zurich)H-Index: 16
Last. Luc Van Gool (ETH Zurich)H-Index: 127
view all 4 authors...
This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based mea...
9,538 CitationsSource
Dec 26, 2007 in ICCV (International Conference on Computer Vision)
#1Krystian Mikolajczyk (University of Surrey)H-Index: 47
#2Jiri Matas (CTU: Czech Technical University in Prague)H-Index: 84
In this paper we propose to transform an image descriptor so that nearest neighbor (NN) search for correspondences becomes the optimal matching strategy under the assumption that inter-image deviations of corresponding descriptors have Gaussian distribution. The Euclidean NN in the transformed domain corresponds to the NN according to a truncated Mahalanobis metric in the original descriptor space. We provide theoretical justification for the proposed approach and show experimentally that the tr...
98 CitationsSource
May 7, 2006 in ECCV (European Conference on Computer Vision)
#1Edward Rosten (University of Cambridge)H-Index: 17
#2Tom Drummond (University of Cambridge)H-Index: 39
Where feature points are used in real-time frame-rate applications, a high-speed feature detector is necessary. Feature detectors such as SIFT (DoG), Harris and SUSAN are good methods which yield high quality features, however they are too computationally intensive for use in real-time applications of any complexity. Here we show that machine learning can be used to derive a feature detector which can fully process live PAL video using less than 7% of the available processing time. By comparison...
2,930 CitationsSource
Cited By217
#1Ho-Gun Ha (DGIST: Daegu Gyeongbuk Institute of Science and Technology)H-Index: 4
#2Kyunghwa Jung (DGIST: Daegu Gyeongbuk Institute of Science and Technology)H-Index: 2
Last. Jaesung Hong (DGIST: Daegu Gyeongbuk Institute of Science and Technology)H-Index: 18
view all 5 authors...
The C-arm X-ray system is a common intraoperative imaging modality used to observe the state of a fractured bone in orthopedic surgery. Using C-arm, the bone fragments are aligned during surgery, and their lengths and angles with respect to the entire bone are measured to verify the fracture reduction. Since the field-of-view of the C-arm is too narrow to visualize the entire bone, a panoramic X-ray image is utilized to enlarge it by stitching multiple images. To achieve X-ray image stitching wi...
#1Guan Jinchao (Chang'an University)
#2Xu Yang (Chang'an University)H-Index: 27
Last. Can Jin (Hefei University of Technology)H-Index: 8
view all 6 authors...
Abstract null null Automated pavement distress detection based on 2D images is facing various challenges. To efficiently complete the crack and pothole segmentation in a practical environment, an automated pixel-level pavement distress detection framework integrating stereo vision and deep learning is developed in this study. Based on the multi-view stereo imaging system, multi-feature pavement image datasets containing color images, depth images and color-depth overlapped images are established...
3 CitationsSource
#1Junfeng JingH-Index: 11
#2Tian GaoH-Index: 1
Last. Changming Sun (CSIRO: Commonwealth Scientific and Industrial Research Organisation)H-Index: 28
view all 5 authors...
Interest point detection is one of the most fundamental and critical problems in computer vision and image processing. In this paper, we carry out a comprehensive review on image feature information (IFI) extraction techniques for interest point detection. To systematically introduce how the existing interest point detection methods extract IFI from an input image, we propose a taxonomy of the IFI extraction techniques for interest point detection. According to this taxonomy, we discuss differen...
#1Utsav Kumar Malviya (Government Engineering College, Sreekrishnapuram)
#2Rahul Gupta (Government Engineering College, Sreekrishnapuram)
#1Sina Lotfian (UCF: University of Central Florida)H-Index: 1
#2Hassan Foroosh (UCF: University of Central Florida)H-Index: 27
We propose a new hierarchical method to match keypoints by exploiting information across multiple scales. Traditionally, for each keypoint a single scale is detected and the matching process is done in the specific scale. We replace this approach with matching across scale-space. The holistic information from higher scales are used for early rejection of candidates that are far away in the feature space. The more localized and finer details of lower scale are then used to decide between remainin...
The computer vision system is the technology that deals with identifying and detecting the objects of a particular class in digital images and videos. Local feature detection and description play an essential role in many computer vision applications like object detection, object classification, etc. The accuracy of these applications depends on the performance of local feature detectors and descriptors used in the methods. Over the past decades, new algorithms and techniques have been introduce...
2 CitationsSource
#1Z. Y. Wang (TUT: Taiyuan University of Technology)H-Index: 1
#2Li Zhipeng (TUT: Taiyuan University of Technology)
Last. Gaowei Yan (TUT: Taiyuan University of Technology)
view all 4 authors...
As the key technology of image processing, image feature extraction and matching are widely used in face recognition, image stitching, and visual SLAM. Among them, ORB algorithm is widely adopted because of its advantage in real-time processing. However, the matching accuracy of the feature points extracted by ORB algorithm is still a concern for further applications. To address the problem, an improved ORB algorithm based on affine transformation was proposed. After FAST feature points are dete...
#1Vassileios Balntas (University of Oxford)H-Index: 2
#2Karel Lenc (University of Oxford)H-Index: 14
Last. Krystian Mikolajczyk (Imperial College London)H-Index: 47
view all 6 authors...
In this paper, a novel benchmark is introduced for evaluating local image descriptors. We demonstrate limitations of the commonly used datasets and evaluation protocols, that lead to ambiguities and contradictory results in the literature. Furthermore, these benchmarks are nearly saturated due to the recent improvements in local descriptors obtained by learning from large annotated datasets. To address these issues, we introduce a new large dataset suitable for training and testing modern descri...
3 CitationsSource
#1Mahdi Saleh (TUM: Technische Universität München)H-Index: 2
#2Shervin Dehghani (TUM: Technische Universität München)
Last. Federico Tombari (TUM: Technische Universität München)H-Index: 43
view all 5 authors...
3D Point clouds are a rich source of information that enjoy growing popularity in the vision community. However, due to the sparsity of their representation, learning models based on large point clouds is still a challenge. In this work, we introduce Graphite, a GRAPH-Induced feaTure Extraction pipeline, a simple yet powerful feature transform and keypoint detector. Graphite enables intensive down-sampling of point clouds with keypoint detection accompanied by a descriptor. We construct a generi...
Oct 1, 2020 in ICIP (International Conference on Image Processing)
#2Omer Om J (Intel)H-Index: 2
Last. Sreenivas Subramoney (Intel)H-Index: 9
view all 5 authors...
Many emerging applications of Visual SLAM running on resource constrained hardware platforms impose very aggressive pose accuracy requirements and highly demanding latency constraints. To achieve the required pose accuracy under constrained compute budget, real-time SLAM implementations have to work with few but highly repeatable and invariant features. While many state-of-the-art techniques, proposed for selecting good features to track, do address some of these concerns, they are computational...