Volume 2, Number 4, Article 5, Pages 324-353 doi:10.1167/2.4.5 http://journalofvision.org/2/4/5/ ISSN 1534-7362
Ecological statistics of Gestalt laws for the perceptual organization of contours
James H. Elder
Centre for Vision Research, York University, Toronto, Canada
[home] [e-mail]
Richard M. Goldberg
User Centred Design Laboratory, IBM Canada Ltd., Toronto, Canada
[e-mail]
Abstract

Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted experiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to sequences of discrete tangent elements detected in the image. By examining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions required to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity. The study yielded a number of important results: (1) these cues, when appropriately defined, are approximately uncorrelated, suggesting a simple factorial model for statistical inference; (2) moderate image-to-image variation of the statistics indicates the utility of general probabilistic models for perceptual organization; (3) these cues differ greatly in their inferential power, proximity being by far the most powerful; and (4) statistical modeling of the proximity cue indicates a scale-invariant power law in close agreement with prior psychophysics.




History
Received August 13, 2002; published August 21, 2002
Citation
Elder, J. H. & Goldberg, R. M. (2002). Ecological statistics of Gestalt laws for the perceptual organization of contours. Journal of Vision, 2(4):5, 324-353, http://journalofvision.org/2/4/5/, doi:10.1167/2.4.5.
Keywords
perceptual organization, computational modeling, natural image statistics, image coding, contours, edges, proximity, good continuation, similarity
for related articles by these authors

for papers that cite this paper


1. Introduction
Perceptual grouping is the problem of aggregating primitive image features that project from a common structure in the visual scene. Nearly 50 years ago, Brunswik and Kamiya (1953) suggested that the classical Gestalt principles of perceptual grouping should be quantitatively related to the statistics of the natural world and presented some informal data on the subject. Here we apply this idea to the perceptual problem of contour organization.
We define visual contour grouping as the problem of integrating the local luminance edges or oriented curve tangents that lie on a common luminance boundary in the image. An important property of contours is their one-dimensionality: a curve can be defined as a function of one parameter (e.g., arc length). This implies an ordering of points on the curve. Thus for the perceptual organization of contours, we must recover not just an aggregation but an ordered sequence of discrete edges or tangents. Without this ordering, it is not possible to define many useful higher-order properties (e.g., curvature, closure, concavity, and convexity).
We pose the problem of contour grouping as a problem of probabilistic inference: the goal of the computation is to compute highly probable sequences of local curve elements. Properties of these local elements may serve as useful cues for deciding which elements should be grouped together, and in what order. Here we consider three such properties: proximity, good continuation, and similarity (in brightness and contrast). These are probabilistic cues: each can provide evidence for a certain grouping of elements, but none can provide a decision with 100% confidence. In order to understand how these cues may be used to optimize the accuracy of perceptual grouping decisions, we must quantitatively characterize the statistics of these cues in natural images. To estimate these statistics, we employ an advanced interactive software tool that allows human observers to rapidly trace the contours they perceive in natural images. These data can then be compared to past and future psychophysical studies to determine the degree to which the human perceptual organization system is tuned to the statistics of the natural world.
In summary, our contributions are:
We develop a Bayesian model for the probabilistic combination of multiple grouping cues (proximity, good continuation, and luminance similarity) to determine local (pair-wise) groupings between contour elements.
We demonstrate how the required probability distributions can be estimated from natural images using advanced software tools.
We show that, when properly defined, these three cues are approximately independent.
We formally characterize the inferential power of each of these three cues, and show that they differ dramatically in their importance for perceptual organization.
We examine the variation in statistics over our sample of images to evaluate the utility of incorporating knowledge of these statistical models for general visual inference.
We report a striking agreement between the statistics of the proximity cue in natural images and prior psychophysical results on the role of proximity in the perception of dot lattice stimuli (Oyama, 1961).
We use these statistics to develop a parametric, generative model for natural image contours that can be used for both analysis and synthesis.
Our findings have important implications for human vision. There exists a wealth of psychophysical data for perceptual grouping phenomena. Qualitative and quantitative models have been constructed to account for these data. However, to answer the question of why these cues act and interact in a specific way requires a detailed understanding of the visual world to which our visual systems have been tuned by evolution and/or learning. Our hope is that this work will contribute to that understanding.
In the remainder of this section, we discuss previous psychophysical studies and computational models of contour grouping, and very recent attempts to characterize relevant statistics. In Sections 2-4, we develop the computational framework within which our study is conducted. In Sections 5 and 6, we report the methods and results for our study. Implications of these results are discussed in Sections 7 and 8.
1.1 Psychophysics of Contour Grouping
The early phenomenological demonstrations of the Gestalt psychologists led to the identification of distinct “principles” or cues to perceptual organization. There are several of these that apply to contour organization, including proximity, good continuation, and similarity (Wertheimer, 1923/1938; Koffka, 1935). Many of the Gestalt demonstrations showed that perceived organizations were determined by the cooperative or competitive interaction between these cues. However, these demonstrations are qualitative, and it has been noted that to predict perceived organizations in novel displays, quantitative models of the relative strength of these cues are required (Hochberg, 1974; Kubovy & Holcombe, 1998).
1.1.1 Proximity
The cue of proximity is perhaps the most fundamental of the Gestalt grouping laws (Kubovy & Holcombe, 1998), and early quantitative studies suggest that the human visual system is extremely sensitive to proximity cues. Uttal, Bunnel and Corwin (1970) found the detection of dotted lines in random dot fields to depend strongly on the density of dots along the line. Barlow (1978) found psychophysical efficiencies of up to 50% for the detection of dot density changes in two-dimensional random dot displays.
These two studies raise the question of whether the law of proximity in the grouping of local elements sampling a one-dimensional contour can be considered as simply a limiting case of the action of proximity in two-dimensional texture grouping (Zucker, Stevens, & Sander, 1983). The fact that Barlow (1978) found no evidence for greater efficiency in detecting regions of higher dot density when these regions were elongated supports this idea.1 On the other hand, more recent experiments by Elder and Zucker (1998a) demonstrate that the perceptual organization of fragmented figures is very sensitive to whether dots are placed on the boundary or in the interior of fragmented figures.
Oyama (1961) made what is probably the first attempt to quantify the law of proximity. Employing a regular rectangular dot lattice, he measured the proportion of time observers experienced vertical versus horizontal organizations as a function of the relative vertical and horizontal spacing. He found that the ratio of durations could be accurately modeled as a power law of the ratio of distances. The data also indicate a significant bias toward vertical organizations.
Kubovy and colleagues (Kubovy & Wagemans, 1995; Kubovy & Holcombe, 1998) have recently modified and elaborated this technique. Instead of a power law, they modeled their results using an exponential model of dot spacing, scaled by the minimum dot spacing present in the display.2 They found evidence that the strength of the proximity cue increases with increasing stimulus duration (from 100 msec to 200 msec). They also found evidence for scale invariance in their experiments: equal scaling of the horizontal and vertical separations had little effect on their results.
Based on a number of experiments, Zucker and Davis (1988) have proposed that the perceptual organization of dotted contours changes abruptly at a dot:space ratio of roughly 1:5. They found that contours sampled more densely than this generate a number of classical illusions that sparsely sampled contours fail to generate. However, no evidence for such a threshold is found in Kubovy’s scaling results.
Elder and Zucker (1994) conducted a series of visual search experiments in which the target and distractors were fragmented outline shapes. They found that the psychophysical effects of the fragmentation could be characterized by the article002.gif norm (sum of squares) of the gaps in the figures. This is consistent with a probabilistic model for contour grouping in which gaps are considered independent and the probability of grouping follows a half-Gaussian distribution. Elder and Zucker (1996a) later used this model in a computer vision algorithm for grouping closed contours in natural images. Gaussian distributions have also been used to model the proximity cue in the perception of clusters in two-dimensional random dot displays (Oeffelen & van Vos, 1982). Compton and Logan (1993) tested the Gaussian model for proximity against an exponential model, but could not statistically discriminate between them.
To summarize, the evidence suggests that proximity acts as a powerful, possibly scale-invariant cue for the perceptual organization of textures and contours. However, there is little agreement about the best quantitative description of the proximity cue: power law, exponential, and Gaussian models have been proposed and supported by various psychophysical data. An understanding of the statistics of the proximity cue for natural image contours might help toward focusing and eventually resolving this debate.
1.1.2 Good Continuation
Less is known about the quantitative nature of the Gestalt law of good continuation. Certainly we now have objective evidence for the action of this law. For example, Beck, Rosenfeld, and Ivry (1989) found longer reaction times for detection of a straight arrangement of elements in a random array when the line elements were laterally jittered. But the action of the good continuation cue in isolation from other cues has best been illustrated by the experiments of Field, Hayes, and Hess (1993). In these experiments, observers were asked to detect the presence of a curvilinear sequence of oriented elements in a random element field. Care was taken to equate the density of elements along and around the contour with the density in the whole display.3 Detection performance was found to decline for more wandering contours, and to rapidly descend to chance when the local elements themselves were jittered in their orientation relative to the path. These results suggest a powerful role for the cue of good continuation in the absence of a cue of proximity.
1.1.3 Brightness and Contrast Similarity
Even less is known about the quantitative role of brightness and contrast similarity in contour grouping. An early dot lattice experiment by Hochberg and Hardy (1960) showed that proximity ratios of up to 2 can be overcome by intensity cues, and Earle (1999) has found evidence for a role of contrast similarity in the grouping of dots leading to the perception of Glass patterns. Contrast reversals between dot pairs are known to completely eliminate the perception of Glass patterns (Glass & Switkes, 1976).
1.2 Contour Grouping in Computer Vision
The computational problem of contour organization has been approached in many different ways. Cocircularity constraints have been applied within an iterative discrete relaxation framework to refine local curve estimates using global contour information (Zucker, Hummel, & Rosenfeld, 1977; Parent & Zucker, 1989). Energy minimization methods for approximating contours with spline models have been extensively investigated (Kass, Witkin, & Terzopoulos, 1987; David & Zucker, 1990). Multi-scale smoothness criteria have been used to impose an organization on image curves (Lowe, 1989; Saund, 1990; Dudek & Tsotsos, 1997), sequential methods for tracking contours within a Bayesian framework have been developed (Cox, Rehg, & Hingorani, 1993), and parallel methods for computing local “saliency” measures based on contour smoothness and total arc length have been studied (Sha’ashua & Ullman, 1988; Freeman, 1992; Alter, 1995; Williams & Jacobs, 1997).
While computational models for contour grouping generally exploit only geometric cues (proximity, good continuation), Elder and Zucker (1996a) have recently developed a probabilistic framework for contour grouping that also exploits luminance cues. In this work, maximum likelihood contours are estimated using a shortest path (Dijkstra’s) algorithm. A similar approach has recently been taken by Crevier (1999), who extends the framework to allow grouping of circular arcs as well as straight contour segments, and attempts to relax the strict assumption of independence between the grouping of segment pairs.
In general, these techniques are capable of grouping edge points into extended chains. However, the goal of computing complete bounding contours has proven to be more elusive. Although approaches using global grouping cues such as convexity (Jacobs, 1996; Huttenlocher & Wayner, 1992) and closure (Elder & Zucker, 1996a; Mahamud, Thornber, & Williams, 1999) have yielded limited success, the general problem of computing the complete bounding contour of an object of arbitrary shape in a complex natural image remains essentially unsolved. One possible way of improving probabilistic models for contour grouping is to ground the models in the actual statistics of grouping cues for natural image contours.
1.3 Statistics of Contour Grouping
The connection between human visual processing and the statistics of the natural visual world has been recognized for nearly 50 years. Attneave (1954) and Barlow (1961) suggested that early visual processing may be designed primarily to reduce the statistical redundancy in natural images, thereby increasing coding efficiency. This idea has led to demonstrations that simple principles of redundancy reduction and sparseness of coding predict early visual filters that are qualitatively similar to the receptive field properties of early visual neurons (Srinivisan, Laughlin, & Dubs, 1982; Field, 1987; Atick & Redlich, 1992; Olshausen & Field, 1996; Olshausen & Field, 1997). 4
The original proposals of Attneave (1954) and Barlow (1961) have led to considerable progress in our understanding of early visual processing in terms of linear transformations that reduce the redundancy or increase the sparseness or statistical independence of neural responses. However, there have been relatively few attempts to understand higher-level visual problems, such as perceptual organization, in terms of the statistics of the natural visual world. This is somewhat surprising because just prior to Attneave and Barlow’s proposals, Brunswik and Kamiya (1953) proposed that the classical Gestalt principles of perceptual organization should be quantitatively related to the statistics of the natural world, and presented some data on the proximity of parallel contours in natural images. However, their suggestion remained largely untested until 1998, when Kruger (1998) first reported data on the second-order spatial statistics of Gabor filter responses to natural images, and we first reported the results of this study (Elder & Goldberg, 1998a; 1998b). More recently, there has been an interesting study of natural image statistics relevant to the problem of image segmentation (Martin, Fowlkes, Tal, & Malik, 2001), and two studies of natural image statistics relevant to the perceptual organization of contours (Geisler, Perry, Super, & Gallogly, 2001; Sigman, Cecchi, Gilbert, & Magnasco, 2001). We discuss the studies relevant to contours in more detail below.
Kruger (1998) examined the second-order “co-occurrence” spatial statistics of Gabor filter responses to natural images. Filter responses were nonlinearly normalized and thresholded prior to statistical analysis. Kruger found statistical evidence for colinearity and parallelism relations in these second-order spatial statistics.
Sigman et al. (2001) also examined the co-occurrence statistics of oriented filter responses to natural images. They reported long-range correlations that adhered to the geometric principle of cocircularity. In a related study, Geisler et al. (2001) measured the co-occurrence statistics of oriented edge elements in natural images, and related these to human performance in detecting sampled contours in cluttered displays. They proposed a simple model for grouping based in part on these statistics, and found that their model was to some degree consistent with the psychophysical data.
The correlations in the joint statistics of oriented edge elements observed in these studies reveal interesting parallel, colinear, and cocircular structure. Characterization of this statistical structure could be useful for understanding key aspects of early visual processing. Just as earlier statistical studies predicted early visual filters that are qualitatively similar to the receptive field properties of early visual neurons, these later studies may help us to relate natural image statistics to more complex aspects of neural coding, including the spatial nonlinearities in complex cells and lateral interactions between neurons in early visual cortex (Gilbert & Wiesel, 1989). An understanding of this second-order statistical structure is also useful for image processing applications, including image compression and image denoising (Simoncelli, 1997).
However, we will argue that these statistics are not sufficient to understand the statistical basis for the perceptual organization of contours. The heart of the matter is that perceptual grouping is not simply the problem of detecting correlations. Rather, the problem is to integrate the sequences of elements that project from common structures in the scene. In particular, contour grouping is the problem of integrating those edge elements that lie on a common luminance boundary. There are other reasons one might predict statistical correlations between edge elements: between parallel elements in texture flows, for example, or between colinear elements on different components of a regular texture. Because the statistics reported in these studies result from a mixture of these effects, they cannot be used directly to understand contour grouping per se.
In our study, we use human observers to trace the contours in natural images, and thus obtain pairs of contour elements that we know should be directly grouped. At the same time, we randomly sample the image to obtain pairs of tangent elements that should not be grouped. From the perspective of probabilistic inference, it is vital to have statistics for both (contour and random) events: it is the ratio of the likelihood distributions for these two events that determines the posterior probability for contour grouping (Section 4). Since these two events are not distinguished in the co-occurrence statistics collected in other studies, these statistics are insufficient for the probabilistic inference of contours.
In addition to collecting co-occurence statistics, Geisler et al. (2001) used a form of contour tracing in order to distinguish the statistics relating contour elements on a common contour from those relating elements on different contours. (Our study was first reported [Elder & Goldberg, 1998a; 1998b] two years before the first report of the work of Geisler et al. [Geisler, Super, & Gallogly, 2000]). However, their technique differs from ours in one crucial respect. Their traces indicate which elements are perceived to lie on a common contour, but they do not provide any information about the ordering of the elements along the contour. This is important because a defining property of contours is their one-dimensionality: a contour may be parameterized by a single real variable, e.g., article003.gif. This imposes an ordering on the local elements of a curve. For example, the point article004.gif on the curve lies between points article005.gif and article006.gif if and only if article007.gif. These properties are essential for defining higher order properties of curves, e.g., curvature, closure, concavity, and convexity.
To maintain this one-dimensional characteristic in a discrete encoding, a contour must be represented as an ordered sequence of local elements. In the study of Geisler et al., contours are represented not as ordered sequences but as unordered sets of oriented elements, and their statistics relate arbitrary pairs of tangents on a contour. These statistics therefore do not reflect the fundamental one-dimensional topological property of contours.
In contrast, we define the problem of contour grouping as the recovery of sequences of tangents projecting from the contours of a scene. Participants trace sequences of tangents defining the contours they perceive in natural images. From these data, we can derive statistics for tangent pairs that are successive components of a common contour. These statistics thus inform us about the cues to inferring the sequence of tangents defining perceived contours.
We model these sequences as Markov chains. The Markov approximation captures the local nature of the physical processes that give rise to these contours, and is consistent with the monotonically decreasing nature of the autocorrelation function of natural images, the spatiotopic structure of early visual cortex, and the well-studied psychophysical principle of proximity. Application of the Markov approximation allows us to understand the problem of contour grouping by characterizing the statistics of local grouping between successive elements comprising a contour.
Our model reflects the fact that the statistical dependencies between neighboring tangents on a contour are much stronger than those between distant tangents on a contour. (Here the terms “neighboring” and “distant” refer to ordinal distance in the chain, not Cartesian distance in the image.) In the approach of Geisler et al. (2001), the power of the strong statistics relating neighboring tangents is diluted with the weak statistics relating distant tangents. This leads to substantial differences between our statistics and their statistics, as we shall see.
It should be noted that there are potential disadvantages of contour-tracing methods. Compared with the measurement of ordinary second-order (co-occurrence) statistics, tracing methods introduce additional possible sources of error and bias. This may include biases of the participants doing the tracing, and errors caused by the software that manages the tracing process. A nice aspect of the Geisler et al. study is that they use both methods, and compare the results of the two. Of course, because we expect significant differences even without error, it is not possible to verify either method with this comparison. Our view is that we have no real choice but to use human-traced contours, because it is not possible to get the required statistics for the Bayesian inference of contours without ground-truth data, and human traces are the best available approximation to ground truth data.
Another distinction of our study is the multiplicity of grouping cues explored, and the rigorous manner in which they are compared. It has long been understood that perceptual organization is determined by the simultaneous action of several factors or principles (Wertheimer, 1923/1938; Koffka, 1935). How are these different factors combined? What is the relative importance of these factors in determining the perceived organization? Other studies have investigated two grouping factors (proximity and good continuation) but have not quantified the relative importance or independence of each individually, and photometric cues have been completely ignored.
Here we investigate three of the classical Gestalt principles (proximity, good continuation, and luminance similarity) for the organization of local curve elements into extended contours. We investigate these properties separately so that we may estimate their relative inferential power, but we also study to what degree they provide independent information for contour grouping, and how they can be optimally combined.
2. Local Contour Representation
To measure the statistics of contour grouping cues in natural images, we must first be able to detect and represent the local contour elements. In other studies, edges have been detected using fixed-scale filters followed by simple nonlinearities. For example, Kruger (1998) used fixed-scale oriented Gabor filters, followed by a point nonlinearity and thresholding. Sigman et al. (2001) used fixed-scale steerable quadrature-pair filters to measure the local oriented energy, followed by a threshold. Geisler et al. (2001) used a two-stage filtering process. In a first stage, potential edge locations were identified as the zero-crossings in the response of a nonoriented log Gabor function. The local energy at these locations was then measured using oriented quadrature-pair log Gabor filters, and a threshold was applied.
Although the filters used in these edge detection techniques all bear some resemblance to the receptive fields of early visual neurons, they are likely to be gross oversimplifications of the cortical processing involved in edge detection. Neurons in primary visual cortex are extremely diverse in their receptive field properties, even within a single class of cell. For example, foveal simple cells in primary visual cortex of macaque range in peak spatial frequency from roughly 0.5 cycles per degree (cpd) to more than 16 cpd, spatial frequency bandwidth from roughly 0.4 octaves to more than 2.6 octaves, orientation bandwidth from less than 10 deg to more than 180 deg, and receptive field height:width ratios from roughly 1:1 to 16:1 (DeValois, Albrecht, & Thorell, 1982; Parker & Hawken, 1988).
Many psychophysical studies using adaptation, masking, and subthreshold summation techniques have demonstrated that early visual processing results from the activity of multiple mechanisms with different spatial frequency tunings (Campbell & Robson, 1968; Wilson & Bergen, 1979; Watson, 1982; Watt & Morgan, 1984; Wilson & Gelb, 1984; Watson, 2000). Recent work suggests that psychophysical edge detection also requires mechanisms over a broad range of orientation bandwidths, as is suggested by the physiological data (Sachs & Elder, 2000).
In this work, we use a multi-scale edge detection method developed for computer vision applications (Elder & Zucker, 1998b). In some respects, this method is more biologically plausible than the methods used by Kruger (1998), Sigman et al. (2001), and Geisler et al. Most critically, filters varying over a range of spatial frequencies (scales) are employed, similar to the range found in early visual cortex of primate. The adaptive filter selection method has been found to predict human visual acuity for blurred edges (Elder & Zucker, 1996b) and human detection efficiency for windowed edges in noise (Sachs & Elder, 2000).
However, our goal here is not to propose a model for human visual edge detection, nor to prefilter the images through a simple model of early visual cortex. Rather, our goal is to reliably detect and locally represent the contours projecting from luminance transitions in the scene. Only if this is achieved will we be accurate in our estimates of contour statistics in natural images. Further, if we hope to eventually measure the degree to which the human visual system is tuned to the statistics of natural images, it is vital that we do not corrupt our measurements of natural images with biases induced by our simplistic models of cortical processing. Such a circular procedure would undermine the significance of any links we may discover.
What leads us to choose the multi-scale edge detection algorithm of Elder and Zucker (1998b) is thus not its biological plausibility, but its performance in detecting edges over the broad range of blur, contrast, and clutter found in natural images. The multi-scale nature of the algorithm is crucial to achieving this.
One important advantage of this representation is that we can invert it to reconstruct an approximation of the original image, and thus can subjectively and objectively measure any information lost or distortions introduced (Elder, 1999). We have conducted detailed studies to show that this representation is perceptually nearly complete, with minimal loss of information.
Although our local edge computation yields an accurate representation of the image, these edges do not provide an optimal basis for contour grouping. Due primarily to spatial discretization, the edge map representation of a contour is jagged and noisy. In order to avoid these problems, we employ a second level of representation in which the contour is locally represented by tangents with position, orientation, and length represented by real numbers (Elder & Zucker, 1996a). These two stages are described in more detail below.
2.1 Edge Computation
We model local edges as Gaussian-blurred step discontinuities in image intensity (Figure 1). The model consists of 5 parameters (Elder & Zucker, 1998b):
Location (to the nearest pixel in this implementation)
Orientation
Blur scale article008.gif
Asymptotic intensity on the bright side of the edge article009.gif
Asymptotic intensity on the dark side of the edge article010.gif
fig01.gif
Figure 1. Local Gaussian-blurred edge model and derivative-based detection/estimation mechanisms.
Detection of edges and estimation of model parameters are based on measurement of the gradient of the intensity function using steerable first derivative of Gaussian filters (Freeman & Adelson, 1991; Perona, 1995), and on estimation of the locations of zero-crossings and extrema of the second derivative using steerable second derivative of Gaussian filters, steered in the gradient direction. While the zero-crossing of the second derivative localizes the edge, the separation of the 2nd derivative extrema in the gradient direction is used to estimate the blur scale of the edge. Estimates of the image intensity at the 2nd derivative extrema are used to estimate the mean intensity article012.gif and the magnitude of the intensity change article013.gif at the edge. These are then used to estimate the asymptotic intensities article014.gif and article015.gif on either side of the edge (Figure 1).
A major obstacle to reliable edge detection is the scale problem: how to choose the scale of local estimation filters in order to prevent false positives and distortion due to noise, while minimizing distortion caused by neighboring image structure. Our method for edge detection solves this problem with an adaptive scale space technique called local scale control (Elder & Zucker, 1998b). This technique selects, at each point in the image, the minimum reliable scale for local estimation. At this scale, hypotheses concerning the sign of response of a linear filter at each point can be tested with statistical reliability. This means in turn that zero-crossings can be reliably detected and localized. This theory of scale selection has been shown to accurately predict human psychophysical performance in edge localization and blur estimation tasks (Elder & Zucker, 1996b). An example of the edge map produced by this algorithm is shown in Figure 2b.
Because our interest is to estimate the properties of grouping cues for the actual contours in an image, it is important that our local elements (edges) be accurate. Otherwise we may simply be measuring artifact of our edge detection methodology. Because we lack ground truth for the actual location of contours in the image, a direct estimate of the accuracy of our edges is not available. We must therefore consider indirect methods.
We have recently reported a method for inverting our edge representation to compute an estimate of the original image from which the edge map was computed (Elder, 1999). Using this algorithm, we have shown our edge representation to be both objectively and subjectively accurate for a wide variety of images. Figure 2c shows the reconstruction of the image in Figure 2a from the computed edge representation.
a fig02a.jpg
b fig02b.gif
c fig02c.jpg
d fig02d.gif
Figure 2. a. Example contour traced by a human participant. b. Edge map from which contours are defined. c. Reconstruction of image from edge representation. d. Portion of tangent map. Each colored line segment represents a distinct tangent.
Although our local edge computation yields an accurate representation of the image, these edges do not provide an optimal basis for contour grouping. The problem is illustrated in Figure 3. Due primarily to spatial discretization, the edge map representation of a contour is jagged and noisy. Even when a set of edges are known to be generated by the same contour, it is difficult to specify an appropriate ordering on the edge pixels, and tracing a contour through any particular ordering yields a curve corrupted by high curvature wiggles due to the discretization.
fig03.gif
Figure 3. Observable geometric cues between edge pixel tangents are dominated by artifact introduced by the spatial discretization of the image.
2.2 Tangent Representation
In order to avoid these problems, we employ a second level of representation in which the contour is locally represented by tangents with position, orientation, and length represented by real numbers (Elder & Zucker, 1996a). These tangents, not constrained by the discrete pixel grid, and often averaging over multiple, roughly colinear edges, provide a much more accurate basis for one-dimensional perceptual grouping. We stress that these computations are not intended as a model for biological visual processing. Rather, they are intended simply to provide an accurate estimate of local contour information.
To construct the tangent representation, each local edge in the image generates a tangent line passing through the edge pixel in the estimated tangent direction. The tangent estimates that are 8-connected to the local edge, which lie within an article021.gif-neighbourhood of the local tangent line, and whose gradient directions are compatible with that of the local edge, are identified with the extended tangent model. For this study, we use article022.gif pixels. Gradient direction compatibility is determined based on the known level of sensor noise, using a first-order noise propagation model.
A greedy algorithm is used to select the subset of tangents that will represent the image contours. Given a connected set of local edges, the longest line segment that faithfully models a subset of these is determined. This subset is then subtracted from the original set. This process is repeated for the connected subsets thus created until all local edges have been modeled. Luminance estimates for the edge pixels modeled by each tangent are averaged, and each tangent is thus represented as a 6-element vector:5
x position
y position
Length
Orientation
Luminance on light side of tangent
Luminance on dark side of tangent
By convention, the spatial component (first four elements) of each tangent vector is represented as a 90 deg counterclockwise rotation from the gradient direction. The (x,y) position represents the location of the base of the vector in the image. A portion of the tangent map computed for the image in Figure 2a is shown in Figure 2d.
3. A Probabilistic Model for Tangent Grouping
The set of tangents article023.gif computed from an image may be enumerated: article024.gif. The set article025.gif of possible contours may then be represented as tangent sequences: A sequence of tangents article026.gif if and only if
article027.gif(1)
This definition restricts the mapping to be injective (tangents cannot be repeated in the sequence), with the exception that the first and last tangent may be the same. In this case, the contour is closed. For the purposes of this study, we will restrict our attention to contours for which contrast polarity does not reverse along the contour, thus tangents in a contour are linked “tip-to-tail.”
We assume that there exists a correct organization of the image article028.gif. Correctness may be defined in terms of objective ground truth, e.g., the contours that bound the objects in a scene. Unfortunately, except perhaps for highly simplified artificial or synthetic scenes, objective ground truth is difficult to obtain. Because our interest is in the perceptual organization of typical natural images, we elect in this study to define correctness in terms of human perception, i.e., a contour is correct if it is what a human observer perceives. If this aspect of human perception is close to veridical, then our study reveals aspects of how contours in the natural world appear in images. If not, we can at least say that our measurements reveal aspects of the information likely used by the human visual system to group contours.
A visual system may use a number of observable properties article029.gif to decide on the correctness of a hypothesized contour. Here we examine properties corresponding to the classical Gestalt cues of proximity, good continuation, and similarity. Knowing these properties article029.gif influences the probability article030.gif that a particular contour article031.gif is correct.
Here we are interested in how properties article032.gif defined on pairs of sequential tangents article033.gif may influence the probability that a contour article034.gif is correct. The local property article035.gif may, e.g., represent the distance between the two tangents or a measure of the curvature of the best continuant between the two tangents. Note that these local properties do not embody many important aspects of global geometry and topology. In general, the visual system may also apply one or more global constraints that only a subset of contours may satisfy, e.g., closure, simplicity (no self-intersections), and completeness. However, here we focus only on characterizing the statistics of local cues for grouping.
Using Bayes’ theorem, the probability article036.gif that a particular contour c is correct may be written as
article037.gif(2)
where
article038.gif(3)
In general, the prior ratio reflects the expected number of tangents in the contours of the image and can be modeled fairly easily. However, because contours can be many tangents in length, the likelihood distributions are in general of very high dimension; in order to model the statistics of contour grouping, some simplifying approximations must be made. Here we model contours as Markov chains (Mumford, 1992; Elder & Zucker, 1996a; Williams & Jacobs, 1997), so that tangent grouping is pair-wise independent. In particular, we will assume that only the grouping cues article039.gif directly relating tangents on the hypothesized contour c depend upon the hypothesis, and that these are conditionally independent. Then
article040.gif(4)
and the likelihood ratio can be computed as a product of local likelihood ratios. Note that this model makes no assumption that local tangent groupings are unique.
The intuition behind this Markov approximation is that the strongest statistics lie in the relations between directly successive tangents on the contour, so these should be modeled directly. The weaker statistics relating more distant tangents are captured approximately through the Markov structure.
Depending on the nature of the global constraints, it may be possible to compute maximally probable contours using an efficient shortest-path computation on a directed graph representing the Markov network. In prior work (Elder & Zucker, 1996a), we developed an algorithm for computing closed contours using these approximations. While in the present work we are interested principally in the problem of estimating local probability distributions, we will employ interactive grouping software that makes use of these approximations in order to rapidly infer contour segments between tangents selected by human observers (Section 5).
4. Defining the Cues
Here we focus on local statistics, and so in the following we will consider contours consisting of just two tangents article041.gif. Then we have
article042.gif(5)
where
article043.gif(6)
The prior ratio article044.gif is approximately equal to the probability that two arbitrarily selected tangents are grouped. If we assume that pair-wise groupings are typically (but not always) unique, then article044.gif is approximately equal to the reciprocal of the number of tangents in the image.
The likelihood ratio article045.gif represents the ratio of the likelihood of the observables given that article046.gif and article047.gif are directly grouped to the likelihood given that they are not. We will refer to these likelihoods throughout the paper as the contour and random likelihoods, respectively.
fig04.gif
Figure 4. Observable data relating two tangents. See text for details.
In this study we consider three observable cues that we expect to be most influential on the probability of grouping: proximity, good continuation, and similarity. As a first order approximation, we use a rectilinear model of completion between two tangents article049.gif (Figure 4):
Proximity: A function of the length article050.gif of the straight-line interpolant (gap).
Good continuation: A function of the two orientation changes article051.gif and article052.gif induced by the interpolation.
Similarity: A function of the differences in estimated image intensities article053.gif and article054.gif between the two tangents. In this study, we consider only grouping that preserves the contrast polarity of the contour.
If we can approximate distinct cues (proximity, good continuation, etc...) as independent when conditioned upon grouping hypotheses, the contour and random likelihoods can be factored.6 Given article055.gif distinct cues article056.gif relating tangents article046.gif and article047.gif, we then have:
article057.gif(7)
It is these likelihoods that we wish to estimate in the present study.
5. Methods
5.1 Participants
Five unpaid participants, all undergraduate or graduate students of vision science, participated in the experiment. All had normal or corrected-to-normal vision. The participants were aware of the goals of the study.
5.2 Apparatus
Experiments were conducted on a Pentium workstation with a Sony Trinitron display. Proprietary software, discussed in detail in Section 5.4, was employed to display the images and allow participants to trace perceived contours.
5.3 Stimuli
Nine arbitrarily selected natural grayscale images were employed (Figure 5). An attempt was made to include images of diverse subjects and settings (e.g., people, objects, animals; indoor, outdoor).
fig05.jpg
Figure 5. Images used for our experiments.
5.4 Software Tool for Interactive Contour Grouping
Our goal was to estimate the probability distributions for the observable grouping cues available in the contours perceived by human observers. To do this, we needed a method for translating observer percepts into tangent sequences: somehow observers must be able to trace the contours they see, and each trace must be mapped to a sequence of tangents.
In order to allow participants to accurately and efficiently trace contours, we employed a software package called Interactive Contour Editor (ICE), previously developed for a demonstration of contour-based image editing technology (Elder & Goldberg, 2001). ICE represents an image by information at its edges, and then allows the image to be modified by direct editing of the contours. This technology uses previously developed algorithms for reconstructing images from our edge representation (Elder, 1999). In order to allow users to efficiently manipulate contours, ICE provides an interactive contour grouping mechanism based on the tangent representation and independence approximations described in Sections 3 and 4 (Elder & Zucker, 1996a). The likelihood distributions employed are generally Gaussian, with parameters chosen using a combination of common sense and trial and error.7
Rather than requiring experimental participants to painstakingly trace each tangent of a contour in sequence, we used the grouping feature of ICE as a kind of “power-assist” to accelerate the process. This approach has a number of advantages:
Accurate estimation of probability distributions requires a large amount of data. Using ICE, participants can group a long sequence of tangents with a relatively small number of mouse clicks, allowing the required quantity of data to be collected quickly.
The increase in efficiency reduces observer fatigue, and thus may improve the quality of data.
ICE turns approximately positioned mouse clicks into selections of the nearest tangent, and allows the observer to group contours in chunks. These capabilities eliminate the need for zooming and unzooming the image, which can cause the observer to lose global perspective and can introduce errors into the data.
A potential disadvantage of the methodology is that ICE may itself introduce errors into the data by selecting groupings that are not perceived by the observer. This problem was largely avoided by provision of an “undo” mechanism that allowed participants to delete groupings they had not intended to make. Participants were instructed to use the ICE grouping tool to full advantage, but to be constantly vigilant for such errors and to correct them immediately. Typically the errors made by ICE were “blunders” that were difficult to miss.
The graphical user interface (GUI) for ICE is shown in Figure 6c. Both the working image and edge map are displayed. For this study, we made use only of the features of ICE that allow contour tangents in a natural image to be selected and grouped as a sequence.
a fig06a.jpg
b fig06b.jpg
c fig06c.gif
Figure 6. a. Example of incorrect path computed by ICE. b. The correct path is computed when the participant selects more closely spaced tangents. c. Graphical user interface for ICE.
Participants select contours by clicking on either the image or the edge map. Grouping is initiated by clicking near a contour. This click initiates a nearest neighbor search in the area of the mouse click to find the nearest edge point. The coordinates of the nearest edge point are used to index the tangent map and thus obtain the index of the tangent corresponding to the edge point. The selected tangent is highlighted in color on both edge and image displays. When the user clicks near a second edge point, a terminating tangent index is similarly obtained.
These two tangent indices form input to a graph algorithm that determines the most probable sequence of tangents connecting the two selected tangents, under the independence approximations discussed in Sections 3 and 4 (Elder & Zucker, 1996a).
In append path mode, subsequent mouse selections will append maximum likelihood contour segments to the previously computed path. In replace path mode, a third mouse selection will deselect the previous path and begin a new path at the selected edge point.
Figure 6 shows an example of this interactive grouping procedure. Selected tangents are indicated by bow tie markers. Because the grouping algorithm is imperfect, selecting two tangents that are too distant may lead to a nonsense path (Figure 6a). In such cases, the participant may undo the path and, with ICE in append path mode, select a sequence of more closely spaced points along the contour that the algorithm can more easily connect (Figure 6b).
5.5 Procedure
Each participant was instructed to use the ICE software to trace all of the contours they perceived in each of the natural images. Images were presented in a random order. Participants could select tangents by clicking in either the image or the edge map. Participants were instructed to try to group complete contours, but not to group multiple contours together. Participants were also instructed to consider not only the contours bounding objects, but also contours arising from reflectance changes, shading and shadows.
Unlike Geisler et al. (2001), we did not force the participants to trace all automatically detected contours in the image. Thus the potential exists that participants may have traced only the more salient contours, even though they were instructed to trace all contours they perceived, and this may lead to bias in the statistics. The difficulty in forcing the participants to trace all detected contours is that, depending upon the characteristics of the monitor, there may be some contours they simply cannot see due to blur, low contrast, and clutter. Geisler et al. get around this by imposing an arbitrary response threshold on edge detection filters and thus suppressing low-contrast edges, but, of course, this could also introduce bias in the contours that are traced.
The experiment produced a total of 16,222 tangent pairs perceived to be directly grouped. Many of these tangent pairs were selected by more than one participant: thus only 7,476 of these pairs were unique. We considered the set of tangent pair samples selected by each observer to be an independent random sample from a common underlying population, and therefore used the full set of 16,222 pairs in estimating the contour distributions.
In order to estimate the random likelihood distributions, we randomly sampled 10,000 pairs of tangents from the same set of 9 images. No attempt was made to avoid tangent pairs that are perceived to be grouped, because such pairs form an insignificant proportion of the total number of tangent pairs present in an image.8
5.6 Modeling of Distributions
As a first-order approximation, we will assume that the grouping cues are mutually independent in the contour and random conditions, i.e., when conditioned upon article062.gif or article063.gif. Thus we are interested in modeling the individual marginal distributions for each of the cues.
We wish to estimate the likelihood distributions for each of the Gestalt cues article064.gif, given tangents that are successive elements of the same contour (article065.gif), and for tangents that are not (article066.gif). In this way we hope to quantify our understanding of the classical Gestalt laws. For example, intuitively we expect that the distance between two tangents known to be successive elements of the same underlying contour will tend to be smaller than the distance between random tangents, but we are seeking a more complete quantitative description of this intuition in the form of these two likelihood distributions.
6. Results
6.1 Proximity Cue
Figure 7a shows a log-log plot of the contour likelihood distribution article067.gif, where article068.gif is the separation between tangents. A scatterplot of the empirical distribution is shown in blue. Note that while we would expect the true likelihood distribution to decrease monotonically as a function of the distance between tangents, the data are nonmonotonic, peaking at roughly article069.gif pixels. We believe the falloff observed for small gaps is because of small random errors in our algorithm’s localization of tangent endpoints; we discuss this below.
For gaps greater than 2 pixels, the data appear roughly linear in log-log coordinates. In other words, the contour likelihood distribution for the proximity cue follows a power law:
article070.gif(8)
a fig07a.gif
b fig07b.gif
c fig07c.gif
d fig07d.gif
Figure 7. a. Estimated likelihood distribution article075.gif for the proximity cue between two tangents known to be successive components of a common contour. b. Sample standard deviation of tangent separation along contours, as a function of sample size. The lack of convergence suggests that the standard deviation is undefined. c. Estimated likelihood distribution article076.gif for the proximity cue between two tangents selected randomly from the image. d. Practical (with estimation noise) and theoretical (without estimation noise) estimates of the posterior distribution article077.gif for tangent grouping based on the proximity cue.
A maximum likelihood estimate of the underlying power law is shown in magenta in Figure 7a. Bootstrapping to estimate standard errors, we estimate the power law parameters to be:
article078.gif
Thus the gaps along a contour follow a power law, with a minimum distance between tangent endpoints of 1.4 pixels (roughly the distance between diagonally adjacent pixels), and an exponent of 2.92. While the mean separation is 2.9 pixels, the standard deviation and higher order moments are undefined: for example, the model predicts that the sample standard deviation of the distance between tangents along a contour, estimated from observed data, will increase unbounded as a function of the size of the sample. Figure 7b shows that our data do indeed exhibit this behavior.
To model the complete distribution of the proximity law, including small gaps, we assume that this power law is corrupted by noise caused by small errors in localizing tangent endpoints. We have observed that these localization errors can be as large as article079.gif pixel in both horizontal and vertical directions. Modeling these random errors as independent and uniformly distributed, we can generate samples of the resulting noisy power law. Such a sample is shown in green in Figure 7a: the striking similarity between the real and simulated data provides strong support for the model.
The random likelihood distribution article080.gif for the proximity cue is shown in Figure 7c. To model this distribution, we assumed that tangents are uniformly distributed over the image. We then computed the exact distribution based on this assumption for a article081.gif pixel image. Distributions for square images of different sizes are obtained by simply scaling this distribution. Although our images were of various sizes and generally were not square, for the purposes of this study we approximated the images as article082.gif pixels. The resulting model is seen to fit the data well.
Having models of the likelihood distributions for both contour and random conditions, and knowing the number of tangents in each image, we can use Equation 1 to compute the posterior probability article077.gif as a function of the tangent separation. Figure 7d shows the posterior with and without the noise introduced by the tangent computation. It can be seen that for small separations, the grouping probability is very high.
a fig08a.gif
b fig08b.gif
c fig08c.gif
d fig08d.gif
Figure 8. Coding the good continuation cue. All data are drawn from the contour condition. a. Scatterplot showing negative correlation of the two interpolation angles. b. Linear recoding into parallelism and cocircularity cues results in a more independent code. c. The parallelism cue is approximately uncorrelated with the proximity cue. d. The cocircularity cue is correlated with the proximity cue. See text for details.
fig09.gif
Figure 9. Recoding the interpolation angles into a sum and difference leads to an intuitive representation of good continuation.
6.2 Good Continuation Cue
Using our first-order model of contour continuation (Figure 4), the grouping of two tangents generates two interpolation angles article088.gif. We find that these two angles are strongly anticorrelated in the contour condition (Figure 8a). This suggests a recoding of the angles into sum (article089.gif) and difference (article090.gif) cues. This encoding appears to be close to the principal component basis for the good continuation cue: the new variables are approximately uncorrelated in the contour condition (Figure 8b).
There are four advantages to this new representation of the good continuation cue. First, it appears to be close to the principal components of the data in the contour condition. Second, when less than 180 deg in absolute value, these two new variables have very natural meaning, in terms of intuition and in terms of the literature (Figure 9). The sum variable represents parallelism: the two tangents are parallel if and only if article091.gif, and article092.gif increases monotonically in absolute value as the tangents become less parallel. The difference variable represents cocircularity: the two tangents are cocircular if and only if article093.gif, and article094.gif increases monotonically in absolute value as the tangents become less cocircular. Finally, note that two tangents are colinear if and only if both article095.gif and article093.gif. Because for roughly 94% of our data for the contour condition both variables are less than 180 deg in absolute value, we will generally refer to article092.gif and article090.gif as parallelism and cocircularity cues, respectively.
In the 6% of cases where either article096.gif or article090.gif is greater than 180 deg in absolute value, we cannot think of them as measuring parallelism or cocircularity. This is because neither geometric property embodies the sense in which the contour is being traversed. Clearly this is an important constraint in contour grouping, and the article096.gif and article090.gif variables do take this into account.
This representation of good continuation is different from that used by Geisler et al. (2001). In their representation, while one angle represents parallelism, the other angle has no obvious intuitive meaning, and we find in our own data that these two angles are highly correlated.
A third advantage of the parallelism and cocircularity encoding of the good continuation cue is that sources of error in measuring these two variables are quite different, and it useful to separate these out. We discuss this at length below. The final advantage is that the parallelism cue will turn out to be much stronger in inferential power than the cocircularity cue (Section 7), and this supports our general goal of constructing representations that concentrate the greatest predictive power in the smallest number of variables.
The parallelism cue (as represented by its standard deviation) is also very nearly uncorrelated with the proximity cue (Figure 8c). It is thus appropriate to consider the marginal statistics of the parallelism cue. Figure 10a shows the likelihood distribution for the parallelism cue in the contour condition (blue curve). The distribution is kurtotic (kurtosis = 16.9). To model this distribution, we employed a generalized Laplacian distribution that has been used in the past to model kurtotic wavelet response histograms (Mallat, 1989; Simoncelli & Adelson, 1996; Simoncelli, 1999):9
article097.gif(9)
This distribution is symmetric and unimodal. article098.gif is the standard deviation and article099.gif controls the kurtosis. If article100.gif the distribution is Gaussian. If article101.gif the distribution has positive kurtosis. If article102.gif, the distribution has negative kurtosis, approaching a uniform distribution as article103.gif. To model an empirical distribution, we determine the generalized Laplacian distribution with matching standard deviation and kurtosis. Given a target kurtosis, the required article104.gif is found using standard nonlinear optimization techniques.
The generalized Laplacian model for the parallelism cue is shown in red in Figure 10a. The model parameters for the parallelism cue in the contour condition are listed in Table 1.
a fig10a.gif
b fig10b.gif
c fig10c.gif
d fig10d.gif
Figure 10. Statistical distributions for the good continuation cues. a. Likelihood distribution article109.gif for the parallelism cue in the contour condition. b. Likelihood distribution article110.giffor the cocircularity cue in the contour condition. c. Likelihood distribution for the good continuation cues article111.gifand article111.gif in the random condition. d. Posterior distributions article112.gifand article113.giffor the good continuation cues. See text for details.
Table 1. Generalized Laplacian Parameters for Parallelism and Cocircularity Cues in the Contour Condition

article114.gif
Kurtosis
article115.gif
Parallelism
42.1 deg
16.9
0.54
Cocircularity
76.8 deg
3.86
0.91
Although the parallelism cue is approximately uncorrelated with the proximity cue, this is not the case for the cocircularity cue (Figure 8d): the standard deviation of the cocircularity cue decreases as the distance between tangents increases. In other words, the cocircularity cue is weaker for more proximal tangents.
We suspected that this observation stems from measurement error. While the parallelism cue depends only on the difference in estimated orientation of the two tangents, the cocircularity cue depends on the orientation article116.gif of the virtual line connecting the relevant endpoints of the tangents (Figure 11). Denoting horizontal and vertical separations of the two tangents as article117.gif and article118.gif respectively, we have article119.gif, and the partial derivatives of article120.gif with respect to article117.gif and article118.gif are
article121.gif(10)
Thus estimation of the cocircularity cue is ill conditioned when the separation between tangents is small.
To minimize this source of error, we restricted our analysis of the cocircularity cue in the contour condition for tangent separations of 5 pixels or greater, where this small-separation effect is negligible. The data and generalized Laplacian model are shown in Figure 10b. The parameters of the model are listed in Table 1.
fig11.gif
Figure 11. The parallelism cue (sum of interpolation angles) can be reexpressed as the difference in orientation of the two tangents. The cocircularity cue, however, depends on the orientation article123.gif of the straight-line interpolant between the tangents.
The generalized Laplacian model can be seen to overestimate the likelihood for nearly parallel tangents (Figure 10a). We believe that a more accurate model may be obtained by modeling the noise in tangent orientation estimation. The effect of independent additive Gaussian noise of standard deviation article124.gif in the two tangent angles is to blur the likelihood distribution with a Gaussian blur kernel of scale article125.gif. We therefore estimate the standard deviation article126.gif of the measurement noise by minimizing the least-squared difference between the data and Gaussian-blurred model for the parallelism and cocircularity cues, obtaining an estimate of article127.gif. The resulting models are shown in green in Figure 10a and 10b.
To determine whether localization error in tangent endpoints could account for the observed correlation between proximity and cocircularity cues, we used our model for endpoint error (uniform noise of article128.gif pixel in article129.gif and article130.gif coordinates, Section 6.1) and our model for the cocircularity cue for tangent separations greater than 5 pixels to simulate cocircularity data for a range of tangent separations. The result, shown in red in Figure 10d, is quite consistent with the observed data, suggesting that were localization error eliminated, the cocircularity cue would be roughly uncorrelated with the proximity cue.
The likelihood distributions for the good continuation cues in the random condition can be modeled by assuming an isotropic tangent distribution. The resulting model can be seen to fit the data well (Figure 10c). Figure 10d shows the posterior distributions for the two cues. In deriving these distributions, we have attempted to remove errors in estimating tangent orientation and location. These distributions thus represent a “best case” scenario. Note that the parallelism cue appears to be more informative than the cocircularity cue: this will be studied more formally in Section 7.
6.3 Similarity Cue
The tangent representation includes an estimate of the image intensity on either side of each tangent. Differences in these intensities between tangents form a potential cue for contour grouping.
Tangents may be grouped in two ways, so that the polarity of contrast is either preserved or reversed along the contour. We did permit contrast reversals in the contours traced by our participants, and found that roughly 13% of the local groupings involved a contrast reversal. However, on examination it appeared that a number of these contrast reversals were erroneous. The difficulty was that our tracing software did not clearly indicate a contrast reversal to participants, who therefore had no way to detect and correct erroneous reversals. We therefore decided to restrict our analysis to segments of contour where no reversals were indicated.
One obvious way of encoding intensity similarity information is to consider the difference article131.gif in the intensity of the light sides of the two tangents article132.gif as one cue, and the difference article133.gif in the intensity of the dark sides of the two tangents as a second cue.
The problem with this approach is that these two cues are highly correlated in the random condition (Figure 12a) and therefore their joint distribution cannot be accurately approximated by the product of the marginal distributions. By inspection it appears that the first principal component of this joint distribution is roughly the sum of these two differences. This forms a brightness cue article134.gif, measuring the difference between the two tangents article132.gif in the mean luminance of the dark and light sides of the underlying edge. The second principal component then forms a contrast cue article135.gif, measuring the difference in the amplitudes of the intensity steps at the two tangents. Using this new basis will result in approximate decorrelation of the cues for the nongrouped case (Figure 12b).
a fig12a.gif
b fig12b.gif
Figure 12. a. Dark and light luminance difference cues between randomly selected tangents are strongly correlated. b. Brightness and contrast cues for randomly selected tangents are approximately uncorrelated.
Table 2 lists the Pearson correlations for these various luminance cues in both the grouped and random conditions. Overall, the brightness and contrast cues are less correlated than the light/dark difference cues. However, we must be careful not to assume that these low correlations mean that the cues are independent. Table 3 lists the Pearson correlations for the absolute values of these same cues. Note the high correlation between brightness and contrast cues in the contour condition. Clearly, it would be a mistake to conclude that these two cues are independent.
These results present us with a dilemma. While the dark/light representation is superior for the contour condition, the brightness/contrast representation is superior for the random condition. One solution is to use different representations for the two conditions; however, then it would be impossible to quantify the inferential power of the individual cues: the most we could do is quantify the power of the two luminance cues taken together.
Table 2. Comparison of Pearson Correlation Coefficients for Two Measures of Luminance Similarity

Dark/light
Brightness/contrast
Contour
0.12
-0.06
Random
0.77
0.01
Table 3. Comparison of Pearson Correlation Coefficients for Absolute Values of Two Measures of Luminance Similarity

Dark/light
Brightness/contrast
Contour
0.19
0.53
Random
0.65
0.02
Because one of the prime purposes of this study is to quantify the inferential power of contour grouping cues, we elect instead to use the brightness/contrast representation. As we shall see, the brightness cue turns out to be a far more powerful cue than the contrast cue, and so this decomposition allows the reduction of the luminance information into a single cue without substantial loss of inferential power.
We found that the contrast of our images varied considerably (standard deviations from 48 to 77 grey levels), and that the statistics of luminance grouping cues co-vary with image contrast (Pearson correlations of 0.72 for the brightness cue and 0.34 for the contrast cue: Figure 13). In order to increase the reliability, and therefore the inferential power of the luminance cues, it is useful to normalize by overall image contrast.
fig13.gif
Figure 13. The statistics of both brightness and contrast cues are strongly correlated with the pixel statistics of the image.
a fig14a.gif
b fig14b.gif
c fig14c.gif
d fig14d.gif
e fig14e.gif
Figure 14. a and b.Estimated likelihood distributions article144.gif and article145.gif for the brightness and contrast cues between two tangents known to be successive components of a common contour. c and d. Estimated likelihood distributions article146.gif and article147.gif for the brightness and contrast cues between two tangents selected randomly from the image. e. Posterior distributions article148.gif and article149.gif for tangent grouping based on the brightness and contrast cues.
Figures 14a and 14b show the contour likelihood distributions article150.gif and article151.gif for the normalized brightness and contrast cues. Both are well modeled by generalized Laplacian distributions. Figures 14c and 14d show the random likelihood distributions article146.gif and article147.gif for these two luminance cues. Unfortunately, the generalized Laplacian models these data less accurately, failing to completely reflect the sharp peaks observed in the data near zero. This result may reflect long-range spatial correlations in intensity values, so that tangents on the same object typically generate much lower luminance cue values than tangents from different objects. The generalized Laplacian parameters for these models are provided in Table 4.
Table 4. Generalized Laplacian Parameters for Similarity Cue Likelihoods

article152.gif
Kurtosis
article153.gif
Normalized brightness cue likelihood (contour)
0.32
4.40
0.87
Normalized contrast cue likelihood (contour)
0.59
3.3
0.96
Normalized brightness cue likelihood (random)
1.2
-0.03
2.0
Normalized contrast cue likelihood (random)
0.87
3.3
0.97
The derived posterior distribution models article148.gif and article149.gif for these two cues are shown in Figure 14c. It is clear that the brightness cue is a far more powerful cue than the contrast cue.
7. Discussion
7.1 Cue Independence
We have assumed that the cues under study are mutually independent when conditioned upon article154.gif or article155.gif. As a first step in checking this assumption, we computed the Pearson correlation coefficients between the absolute values of the cues in the contour condition (Table 5). We see from this calculation that correlations are relatively small (less than 0.1) except for the brightness/contrast correlation (0.53). However, the relatively weak inferential power of the contrast cue (see below) suggests that the contrast cue could be omitted from models of contour grouping without substantial loss in inferential power.
Table 5. Pearson Correlations Between Grouping Principles
Grouping cues
Parallelism
Cocircularity
Brightness
Contrast
Proximity
0.01
-0.10
0.09
0.07
Parallelism
-0.05