We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) method. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our method uses a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.
Joint work with Prof Miguel Eckstein.
The goal is to develop and explore computational models of human eye movements in various visual search tasks. Peer reviewed VSS abstracts:
Convolutional deep neural nets have emerged as a highly effective approach for machine vision, but there are a number of open issues regarding training (e.g., a large number of model parameters to be learned, and a number of manually tuned algorithm parameters) and interpretation (e.g., geometric interpretations of neurons at various levels of the hierarchy). In this paper, our goal is to explore alternative convolutional architectures which are easier to interpret and simpler to implement. In particular, we investigate a framework that combines a front end based on the known neuroscientific findings about the visual pathway, together with unsupervised feature extraction based on clustering. Supervised classification, using a generic radial basis function (RBF) support vector machine (SVM), is applied at the end. We obtain competitive classification results on standard image databases, beating the state of the art for NORB (uniform-normalized) and approaching it for MNIST.
This work is aimed at obtaining the statistics as a probabilistic model pertaining to the geometric, topological and photometric structure of natural images. The image structure is represented by its segmentation graph derived from the low-level hierarchical multiscale image segmentation. We first estimate the statistics of a number of segmentation graph properties from a large number of images. Our estimates confirm some findings reported in the past work, as well as provide some new ones. We then obtain a Markov random field based model of the segmentation graph which subsumes the observed statistics. To demonstrate the value of the model and the statistics, we show how its use as a prior impacts three applications: image classification, semantic image segmentation and object detection.
This paper presents a new method for viewpoint invariant pedestrian recognition problem. We use a metric learning framework to obtain a robust metric for large margin nearest neighbor classification with rejection (i.e., classifier will return no matches if all neighbors are beyond a certain distance). The rejection condition necessitates the use of a uniform threshold for a maximum allowed distance for deeming a pair of images a match. In order to handle the rejection case, we propose a novel cost similar to the Large Margin Nearest Neighbor (LMNN) method and call our approach Large Margin Nearest Neighbor with Rejection (LMNN-R). Our method is able to achieve significant improvement over previously reported results on the standard Viewpoint Invariant Pedestrian Recognition (VIPeR ) dataset.
We present a new algorithm for low-level multiscale segmentation of images. The algorithm is designed to detect image regions regardless of their shapes, sizes, and levels of interior homogeneity, by doing a multiscale analysis without assuming any prior models of region geometry. As in previous work, a region is modeled as a homogeneous set of connected pixels surrounded by ramp discontinuities. A new transform, called the ramp transform, is described, which is used to detect ramp discontinuities and seeds for all regions in an image. The algorithm does not require any major user supplied parameters while it provides all, previously unknown, naturally occurring segmentations, which are perceptually valid and organized in a hierarchy, and it has been shown to outperform other available algorithms on a low-level segmentation benchmark.