Loading...
Thumbnail Image

Date

2012

Journal Title

Journal ISSN

Volume Title

Publisher

Scene model based segmentation of video into foreground and background structure has long been an important and ongoing research topic in image processing and computer vision. Segmentation of complex video scenes into binary foreground/background label images is often the first step in a wide range of video processing applications. Examples of common applications include surveillance, Traffic Monitoring, People Tracking, Activity Recognition, and Event Detection.


A wide range of scene modeling techniques have been proposed for identifying foreground pixels or regions in surveillance video. Broadly speaking, the purpose of a scene model is to characterize the distribution of features in an image block or pixel over time. In the majority of cases, the scene model is used to represent the distribution of background features (background modeling) and the distribution of foreground features is assumed to be uniform or Gaussian. In other cases, the model characterizes the distribution of foreground and background values and the segmentation is performed by maximum likelihood.


Pixel-level scene models characterize the distributions of spatiotemporally localized image features centered about each pixel location in video over time. Individual video frames are segmented into foreground and background regions based on a comparison between pixel-level features from within the frame under segmentation and the appropriate elements of the scene model at the corresponding pixel location. Prominent pixel level scene models include the Single Gaussian, Gaussian Mixture Model and Kernel Density Estimation.


Recently reported advancements in scene modeling techniques have been largely based on the exploitation of local coherency in natural imagery based on integration of neighborhood information among nonparametric pixel-level scene models. The earliest scene models inadvertently made use of neighborhood information because they modeled images at the block level. As the resolution of the scene models progressed, textural image features such as the spatial derivative, local binary pattern (LBP) or Wavelet coefficients were employed to provide neighborhood-level structural information in the pixel-level models. In the most recent case, Barnich and Van DroogenBroeck proposed the Visual Background Extractor (ViBe), where neighborhood-level information is incorporated into the scene model in the learning step. In ViBe, the learning function is distributed over a small region such that new background information is absorbed at both the pixel and neighborhood level.


In this dissertation, I present a nonparametric pixel level scene model based on several recently reported stochastic video segmentations algorithms. I propose new stochastic techniques for updating scene models over time that are focused on the incorporation of neighborhood-level features into the model learning process and demonstrate the effectiveness of the system on a wide range of challenging visual tasks. Specifically, I propose a model maintenance policy that is based on the replacement of outliers within each nonparametric pixel level model through kernel density estimation (KDE) and a neighborhood diffusion procedure where information sharing between adjacent models having significantly different shapes is discouraged. Quantitative results are compared using the well known percentage correct classification (PCC) and a new probability correct classification (PrCC) metric, where the underlying models are scrutinized prior to application of a final segmentation threshold. In all cases considered, the superiority of the proposed model with respect to the existing state-of-the-art techniques is well established.

Description

Keywords

Computer vision, Image processing, Video surveillance, Pattern recognition systems

Citation

DOI

Related file

Notes

Sponsorship