Learning patterns of activity using real-time tracking

Published on Aug 1, 2000in IEEE Transactions on Pattern Analysis and Machine Intelligence17.861
· DOI :10.1109/34.868677
Chris Stauffer17
Estimated H-index: 17
(MIT: Massachusetts Institute of Technology),
W.E.L. Grimson48
Estimated H-index: 48
(MIT: Massachusetts Institute of Technology)
Sources
Abstract
Our goal is to develop a visual monitoring system that passively observes moving objects in a site and learns patterns of activity from those observations. For extended sites, the system will require multiple cameras. Thus, key elements of the system are motion tracking, camera coordination, activity classification, and event detection. In this paper, we focus on motion tracking and show how one can use observed motion to learn patterns of activity in a site. Motion segmentation is based on an adaptive background subtraction method that models each pixel as a mixture of Gaussians and uses an online approximation to update the model. The Gaussian distributions are then evaluated to determine which are most likely to result from a background process. This yields a stable, real-time outdoor tracker that reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. While a tracking system is unaware of the identity of any object it tracks, the identity remains the same for the entire tracking sequence. Our system leverages this information by accumulating joint co-occurrences of the representations within a sequence. These joint co-occurrence statistics are then used to create a hierarchical binary-tree classification of the representations. This method is useful for classifying sequences, as well as individual instances of activities in a site.
Figures & Tables
Download
📖 Papers frequently viewed together
2000ECCV: European Conference on Computer Vision
3 Authors (Ahmed Elgammal, ..., Larry S. Davis)
1,979 Citations
1,259 Citations
2,568 Citations
References23
Newest
#1Nuria Oliver (Microsoft)H-Index: 56
#2Barbara Rosario (University of California, Berkeley)H-Index: 13
Last. Alex Pentland (MIT: Massachusetts Institute of Technology)H-Index: 135
view all 3 authors...
We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task. The system deals in particularly with detecting when interactions between people occur and classifying the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both comp...
1,506 CitationsSource
#1Lily Lee (MIT: Massachusetts Institute of Technology)H-Index: 1
#2Raquel Romano (MIT: Massachusetts Institute of Technology)H-Index: 1
Last. Gideon Stein (MIT: Massachusetts Institute of Technology)H-Index: 15
view all 3 authors...
Monitoring of large sites requires coordination between multiple cameras, which in turn requires methods for relating events between distributed cameras. This paper tackles the problem of automatic external calibration of multiple cameras in an extended scene, that is, full recovery of their 3D relative positions and orientations. Because the cameras are placed far apart, brightness or proximity constraints cannot be used to match static features, so we instead apply planar geometric constraints...
370 CitationsSource
#1Jianbo Shi (CMU: Carnegie Mellon University)H-Index: 48
#2Jitendra Malik (University of California, Berkeley)H-Index: 142
We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similar...
12.8k CitationsSource
#1Robert T. Collins (CMU: Carnegie Mellon University)H-Index: 53
#2Alan J. Lipton (CMU: Carnegie Mellon University)H-Index: 11
Last. Lambert Ernest WixsonH-Index: 5
view all 11 authors...
Under the three-year Video Surveillance and Monitoring (VSAM) project (1997‐1999), the Robotics Institute at Carnegie Mellon University (CMU) and the Sarnoff Corporation developed a system for autonomous Video Surveillance and Monitoring. The technical approach uses multiple, cooperative video sensors to provide continuous coverage of people and vehicles in a cluttered environment. This final report presents an overview of the system, and of the technical accomplishments that have been achieved.
1,248 Citations
Jun 23, 1999 in CVPR (Computer Vision and Pattern Recognition)
#1Chris Stauffer (MIT: Massachusetts Institute of Technology)H-Index: 17
#2W.E.L. GrimsonH-Index: 48
A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions o...
6,398 CitationsSource
#1Nuria Oliver (MIT: Massachusetts Institute of Technology)H-Index: 56
#2Barbara Rosario (MIT: Massachusetts Institute of Technology)H-Index: 13
Last. Alex Pentland (MIT: Massachusetts Institute of Technology)H-Index: 135
view all 3 authors...
We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task. The system is particularly concerned with detecting when interactions between people occur, and classifying the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both...
149 CitationsSource
#1Alan J. Lipton (CMU: Carnegie Mellon University)H-Index: 11
#2Hironobu Fujiyoshi (CMU: Carnegie Mellon University)H-Index: 19
Last. R.S. PatilH-Index: 1
view all 3 authors...
This paper describes an end-to-end method for extracting moving targets from a real-time video stream, classifying them into predefined categories according to image-based properties, and then robustly tracking them. Moving targets are detected using the pixel wise difference between consecutive image frames. A classification metric is applied these targets with a temporal consistency constraint to classify them into three categories: human, vehicle or background clutter. Once classified targets...
1,043 CitationsSource
Jun 23, 1998 in CVPR (Computer Vision and Pattern Recognition)
#1W.E.L. Grimson (MIT: Massachusetts Institute of Technology)H-Index: 48
#2Chris StaufferH-Index: 17
Last. L. LeeH-Index: 3
view all 4 authors...
We describe a vision system that monitors activity in a site over extended periods of time. The system uses a distributed set of sensors to cover the site, and an adaptive tracker detects multiple moving objects in the sensors. Our hypothesis is that motion tracking is sufficient to support a range of computations about site activities. We demonstrate using the tracked motion data to calibrate the distributed sensors, to construct rough site models, to classify detected objects, to learn common ...
533 CitationsSource
#1Ismail Haritaoglu (UMD: University of Maryland, College Park)H-Index: 17
#2David HarwoodH-Index: 25
Last. Larry S. DavisH-Index: 119
view all 3 authors...
W/sup 4/ is a real time visual surveillance system for detecting and tracking people and monitoring their activities in an outdoor environment. It operates on monocular grayscale video imagery, or on video imagery from an infrared camera. Unlike many of the systems for tracking people, W/sup 4/ makes no use of color cues; instead, W/sup 4/ employs a combination of shape analysis and tracking to locate people and their parts (head, hands, feet, torso) and to create models of people's appearance s...
608 CitationsSource
#1Larry S. Davis (UMD: University of Maryland, College Park)H-Index: 119
#2Sandor Fejes (UMD: University of Maryland, College Park)H-Index: 3
Last. Michael J. Black (PARC)H-Index: 107
view all 6 authors...
W4 is a real time visual surveillance system for detecting and tracking people and monitoring their activities in an outdoor environment. It operates on monocular grayscale video imagery, or on video imagery from an IR camera.Unlike many of systems for tracking people, W4 makes no use of color cues. Instead, W4 employs a combination of shape analysis and tracking to create models of people's appearance so that they can be tracked through interactions such as occlusions. W4 is capable of simultan...
32 CitationsSource
Cited By3190
Newest
#5Fei-Yue Wang (CAS: Chinese Academy of Sciences)
Abstract null null Change detection (CD) is an important vision task for autonomous landing of unmanned aerial vehicles (UAV) on water. High-density photoreceptors and lateral inhibition mechanisms have inspired a novel biologic computational method based on structure and properties in eagle eyes as proposed for change detection. We call this method “STabCD,” which ensures spatiotemporal distribution consistency to achieve foreground acquisition, noise reduction, and background adaptability. The...
Source
To address the challenging portrait video matting problem more precisely, existing works typically apply some matting priors that require additional user efforts to obtain, such as annotated trimaps or background images. In this work, we observe that instead of asking the user to explicitly provide a background image, we may recover it from the input video itself. To this end, we first propose a novel background restoration module (BRM) to recover the background image dynamically from the input ...
Source
#1Xing Li (NU: Northeastern University)H-Index: 2
#2Zijiang Zhu (Guangdong University of Foreign Studies)H-Index: 1
Last. Yi Hu (Guangdong University of Foreign Studies)H-Index: 2
view all 5 authors...
Abstract Human action recognition is a key component in modern artificial intelligent systems that greatly enhance the manipulating of various robots, such as rehabilitation robtos and industrial robots. Existing action recognition algorithms mainly depend on a predefined spatial sequence code book, which may fail to discover discriminative spatial–temporal features to mimic robots. In this paper, we propose to engineer the spatial–temporal action features that can deeply encode the similarity o...
Source
#1Fakhri Alam KhanH-Index: 11
#2Muhammad Nawaz (Center for Excellence in Education)H-Index: 31
Last. Fawad Qayum (University of Malakand)H-Index: 4
view all 5 authors...
Background subtraction, being the most cited algorithm for foreground detection, encounters the major problem of proper threshold value at run time. For effective value of the threshold at run time in background subtraction algorithm, the primary component of the foreground detection process, motion is used, in the proposed algorithm. For the said purpose, the smooth histogram peaks and valley of the motion were analyzed, which reflects the high and slow motion areas of the moving object(s) in t...
1 CitationsSource
#1Sudip Subedi (FIU: Florida International University)H-Index: 2
#2Nipesh Pradhananga (FIU: Florida International University)H-Index: 11
Last. Hazal Ergun (FIU: Florida International University)
view all 3 authors...
1 CitationsSource
Source
Understanding and representing traffic patterns are key to detecting anomalies in the maritime domain. To this end, we propose a novel graph-based traffic representation and association scheme to cluster trajectories of vessels using automatic identification system (AIS) data. We utilize the (un)clustered data to train a recurrent neural network (RNN)-based evidential regression model, which can predict a vessel's trajectory at future timesteps with its corresponding prediction uncertainty. This...
#1Jun Liu (WUT: Wuhan University of Technology)H-Index: 1
Last. Jingpan Bai (WUT: Wuhan University of Technology)H-Index: 6
view all 3 authors...
Abstract Human action recognition is a key component in modern artificial intelligent systems, such as sport analysis, video surveillance and human–computer interaction (HCI). Existing action recognition algorithms mainly depend on a predefined spatial sequence code book, which may fail to discover discriminative spatial–temporal features. In this paper, we propose to engineer the spatial–temporal action features that can deeply encode the similarity of within-class human actions and dissimilari...
Source
We propose an algorithm for detecting low-contrast objects in different target environments for application in an optoelectronic system. The algorithm makes it possible to detect low-contrast objects in a complex environment with account of relative movement of the camera and the object in real time.
Source