Volume 3, Number 9, Abstract 43, Page 43a doi:10.1167/3.9.43 http://journalofvision.org/3/9/43/ ISSN 1534-7362
Ecological statistics of grouping by similarity
Charless C Fowlkes
Department of Electrical Engineering and Computer Science, University of California, Berkeley, USA
[e-mail]
David R Martin
Jitendra Malik
Abstract

Goal: Wertheimer (1923) proposed visual similarity as a key grouping factor but a precise definition has proved elusive. We formalize similarity by designing a function W(i,j) whose value is the probability that a pair of points i and j belong to the same visual group. Our goal is to learn an optimal functional form for W(i,j) based on brightness, texture and color measurements, and to quantify the relative power of these cues. Methods: A large dataset (~1000) of natural images, each segmented by multiple human observers (~10), provides the ground truth S(i,j) for pairs of pixels. S(i,j) = 1 if the pair lies in the same segment, 0 otherwise and will serve as the target function for W(i,j). We consider both region and boundary cues for computing W(i,j). Region cues are based on brightness, color, and texture differences between image patches at i and j, each characterized by histograms of the outputs of V1 like mechanisms. Oriented filter responses are used for texture and a*, b* features in CIE L*a*b* space for color. Boundary cues are incorporated by looking for the presence of an "intervening contour", a large gradient (in brightness, texture or color) along a straight line connecting two pixels. The parameters of the patch and gradient features are calibrated using the human segmented images. Performance was evaluated on a separate test set using precision-recall curves as well as mutual information between W(i,j) and S(i,j) based on various cues. Results: For brightness, gradients yield better results than patch differences. However, for color, patches outperform gradients. Texture is the single most powerful cue, with both patches and gradients carrying significant independent information. The mutual information between S(i,j) and W(i,j) using all similarity cues is 0.19 nats, just 0.06 short of that between different human subjects. The proximity of the two pixels does not add any information beyond that provided by the similarity cues.

History
Received August 22, 2003; published October 22, 2003
Citation
Fowlkes, C. C., Martin, D. R., & Malik, J. (2003). Ecological statistics of grouping by similarity [Abstract]. Journal of Vision, 3(9):43, 43a, http://journalofvision.org/3/9/43/, doi:10.1167/3.9.43.
Keywords
None
On-Line Presentation
for articles that cite this paper
for related articles by these authors
for papers that cite this paper
Get citation
Get help with this






jov