| Volume 2, Number 4, Article 5, Pages 324-353 |
doi:10.1167/2.4.5 |
http://journalofvision.org/2/4/5/ |
ISSN 1534-7362 |
Ecological statistics of Gestalt laws for the perceptual organization of contours
James H. Elder |
Centre for Vision Research, York University, Toronto, Canada |
|
Richard M. Goldberg |
User Centred Design Laboratory, IBM Canada Ltd., Toronto, Canada |
|
Abstract
Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted experiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to sequences of discrete tangent elements detected in the image. By examining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions required to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity. The study yielded a number of important results: (1) these cues, when appropriately defined, are approximately uncorrelated, suggesting a simple factorial model for statistical inference; (2) moderate image-to-image variation of the statistics indicates the utility of general probabilistic models for perceptual organization; (3) these cues differ greatly in their inferential power, proximity being by far the most powerful; and (4) statistical modeling of the proximity cue indicates a scale-invariant power law in close agreement with prior psychophysics.
 |
|
History
Received August 13, 2002; published August 21, 2002
Citation
Elder, J. H. & Goldberg, R. M. (2002). Ecological statistics of Gestalt laws for the perceptual organization of contours.
Journal of Vision, 2(4):5, 324-353,
http://journalofvision.org/2/4/5/,
doi:10.1167/2.4.5.
Keywords
perceptual organization, computational modeling, natural image statistics, image coding, contours, edges, proximity, good continuation, similarity
for related articles by these authors
for papers that cite this paper |
Perceptual grouping is the problem of aggregating
primitive image features that project from a common structure in the visual
scene. Nearly 50 years ago,
Brunswik and Kamiya (1953) suggested
that the classical Gestalt principles of perceptual grouping should be
quantitatively related to the statistics of the natural world and presented some
informal data on the subject. Here we apply this idea to the perceptual problem
of contour organization.
We define visual contour grouping as the problem of
integrating the local luminance edges or oriented curve tangents that lie on a
common luminance boundary in the image. An important property of contours is
their one-dimensionality: a curve can be defined as a function of one parameter
(e.g., arc length). This implies an ordering of points on the curve. Thus for
the perceptual organization of contours, we must recover not just an aggregation
but an ordered sequence of discrete edges or tangents. Without this ordering, it
is not possible to define many useful higher-order properties (e.g., curvature,
closure, concavity, and convexity).
We pose the problem of contour grouping as a problem of
probabilistic inference: the goal of the computation is to compute highly
probable sequences of local curve elements. Properties of these local elements
may serve as useful cues for deciding which elements should be grouped together,
and in what order. Here we consider three such properties: proximity, good
continuation, and similarity (in brightness and contrast). These are
probabilistic cues: each can provide evidence for a certain grouping of
elements, but none can provide a decision with 100% confidence. In order to
understand how these cues may be used to optimize the accuracy of perceptual
grouping decisions, we must quantitatively characterize the statistics of these
cues in natural images. To estimate these statistics, we employ an advanced
interactive software tool that allows human observers to rapidly trace the
contours they perceive in natural images. These data can then be compared to
past and future psychophysical studies to determine the degree to which the
human perceptual organization system is tuned to the statistics of the natural
world.
In summary, our contributions
are: | • | We
develop a Bayesian model for the probabilistic combination of multiple grouping
cues (proximity, good continuation, and luminance similarity) to determine local
(pair-wise) groupings between contour elements. |
| • | We demonstrate how the
required probability distributions can be estimated from natural images using
advanced software tools. |
| • | We show that, when
properly defined, these three cues are approximately independent. |
| • | We formally
characterize the inferential power of each of these three cues, and show that
they differ dramatically in their importance for perceptual organization.
|
| • | We examine the
variation in statistics over our sample of images to evaluate the utility of
incorporating knowledge of these statistical models for general visual
inference. |
| • | We report a striking
agreement between the statistics of the proximity cue in natural images and
prior psychophysical results on the role of proximity in the perception of dot
lattice stimuli (Oyama, 1961). |
| • | We use these statistics
to develop a parametric, generative model for natural image contours that can be
used for both analysis and synthesis. |
Our findings have important implications for human
vision. There exists a wealth of psychophysical data for perceptual grouping
phenomena. Qualitative and quantitative models have been constructed to account
for these data. However, to answer the question of why these cues act and
interact in a specific way requires a detailed understanding of the visual world
to which our visual systems have been tuned by evolution and/or learning. Our
hope is that this work will contribute to that understanding.
In the remainder of this section, we discuss previous
psychophysical studies and computational models of contour grouping, and very
recent attempts to characterize relevant statistics. In Sections 2-4, we develop
the computational framework within which our study is conducted. In Sections 5
and 6, we report the methods and results for our study. Implications of these
results are discussed in Sections 7 and
8. 1.1 Psychophysics of Contour Grouping
The early phenomenological demonstrations of the
Gestalt psychologists led to the identification of distinct
“principles” or cues to perceptual organization. There are several
of these that apply to contour organization, including proximity, good
continuation, and similarity
( Wertheimer, 1923/1938;
Koffka, 1935). Many of the Gestalt
demonstrations showed that perceived organizations were determined by the
cooperative or competitive interaction between these cues. However, these
demonstrations are qualitative, and it has been noted that to predict perceived
organizations in novel displays, quantitative models of the relative strength of
these cues are required ( Hochberg, 1974;
Kubovy & Holcombe, 1998).
The cue of proximity is perhaps the most fundamental of
the Gestalt grouping laws
( Kubovy & Holcombe, 1998), and early
quantitative studies suggest that the human visual system is extremely sensitive
to proximity cues.
Uttal, Bunnel and Corwin (1970) found the
detection of dotted lines in random dot fields to depend strongly on the density
of dots along the line. Barlow (1978)
found psychophysical efficiencies of up to 50% for the detection of dot density
changes in two-dimensional random dot displays.
These two studies raise the question of whether the law
of proximity in the grouping of local elements sampling a one-dimensional
contour can be considered as simply a limiting case of the action of proximity
in two-dimensional texture grouping
( Zucker, Stevens, & Sander, 1983). The
fact that Barlow (1978) found no evidence
for greater efficiency in detecting regions of higher dot density when these
regions were elongated supports this idea. 1
On the other hand, more recent experiments by
Elder and Zucker (1998a) demonstrate that
the perceptual organization of fragmented figures is very sensitive to whether
dots are placed on the boundary or in the interior of fragmented figures.
Oyama (1961) made
what is probably the first attempt to quantify the law of proximity. Employing a
regular rectangular dot lattice, he measured the proportion of time observers
experienced vertical versus horizontal organizations as a function of the
relative vertical and horizontal spacing. He found that the ratio of durations
could be accurately modeled as a power law of the ratio of distances. The data
also indicate a significant bias toward vertical organizations.
Kubovy and colleagues
( Kubovy & Wagemans, 1995;
Kubovy & Holcombe, 1998) have recently
modified and elaborated this technique. Instead of a power law, they modeled
their results using an exponential model of dot spacing, scaled by the minimum
dot spacing present in the display. 2 They
found evidence that the strength of the proximity cue increases with increasing
stimulus duration (from 100 msec to 200 msec). They also found evidence for
scale invariance in their experiments: equal scaling of the horizontal and
vertical separations had little effect on their results.
Based on a number of experiments,
Zucker and Davis (1988) have proposed that
the perceptual organization of dotted contours changes abruptly at a dot:space
ratio of roughly 1:5. They found that contours sampled more densely than this
generate a number of classical illusions that sparsely sampled contours fail to
generate. However, no evidence for such a threshold is found in Kubovy’s
scaling results.
Elder and Zucker (1994)
conducted a series of visual search experiments in which the target and
distractors were fragmented outline shapes. They found that the psychophysical
effects of the fragmentation could be characterized by the
 norm (sum of
squares) of the gaps in the figures. This is consistent with a probabilistic
model for contour grouping in which gaps are considered independent and the
probability of grouping follows a half-Gaussian distribution.
Elder and Zucker (1996a) later used this
model in a computer vision algorithm for grouping closed contours in natural
images. Gaussian distributions have also been used to model the proximity cue in
the perception of clusters in two-dimensional random dot displays
( Oeffelen & van Vos, 1982).
Compton and Logan (1993) tested the
Gaussian model for proximity against an exponential model, but could not
statistically discriminate between them.
To summarize, the evidence suggests that proximity acts
as a powerful, possibly scale-invariant cue for the perceptual organization of
textures and contours. However, there is little agreement about the best
quantitative description of the proximity cue: power law, exponential, and
Gaussian models have been proposed and supported by various psychophysical data.
An understanding of the statistics of the proximity cue for natural image
contours might help toward focusing and eventually resolving this debate.
Less is known about the quantitative nature of the
Gestalt law of good continuation. Certainly we now have objective evidence for
the action of this law. For example,
Beck, Rosenfeld, and Ivry (1989) found
longer reaction times for detection of a straight arrangement of elements in a
random array when the line elements were laterally jittered. But the action of
the good continuation cue in isolation from other cues has best been illustrated
by the experiments of
Field, Hayes, and Hess (1993). In these
experiments, observers were asked to detect the presence of a curvilinear
sequence of oriented elements in a random element field. Care was taken to
equate the density of elements along and around the contour with the density in
the whole display. 3 Detection performance
was found to decline for more wandering contours, and to rapidly descend to
chance when the local elements themselves were jittered in their orientation
relative to the path. These results suggest a powerful role for the cue of good
continuation in the absence of a cue of proximity.
1.1.3 Brightness and Contrast Similarity
Even less is known about the quantitative role of
brightness and contrast similarity in contour grouping. An early dot lattice
experiment by Hochberg and Hardy (1960)
showed that proximity ratios of up to 2 can be overcome by intensity cues, and
Earle (1999) has found evidence for a role
of contrast similarity in the grouping of dots leading to the perception of
Glass patterns. Contrast reversals between dot pairs are known to completely
eliminate the perception of Glass patterns
( Glass & Switkes, 1976). 1.2 Contour Grouping in Computer Vision
The computational problem of contour organization has
been approached in many different ways. Cocircularity constraints have been
applied within an iterative discrete relaxation framework to refine local curve
estimates using global contour information
( Zucker, Hummel, & Rosenfeld, 1977;
Parent & Zucker, 1989). Energy
minimization methods for approximating contours with spline models have been
extensively investigated
( Kass, Witkin, & Terzopoulos, 1987;
David & Zucker, 1990). Multi-scale
smoothness criteria have been used to impose an organization on image curves
( Lowe, 1989;
Saund, 1990;
Dudek & Tsotsos, 1997), sequential
methods for tracking contours within a Bayesian framework have been developed
( Cox, Rehg, & Hingorani, 1993), and
parallel methods for computing local “saliency” measures based on
contour smoothness and total arc length have been studied
( Sha’ashua & Ullman, 1988;
Freeman, 1992;
Alter, 1995;
Williams & Jacobs, 1997).
While computational models for contour grouping
generally exploit only geometric cues (proximity, good continuation),
Elder and Zucker (1996a) have recently
developed a probabilistic framework for contour grouping that also exploits
luminance cues. In this work, maximum likelihood contours are estimated using a
shortest path (Dijkstra’s) algorithm. A similar approach has recently been
taken by Crevier (1999), who extends the
framework to allow grouping of circular arcs as well as straight contour
segments, and attempts to relax the strict assumption of independence between
the grouping of segment pairs.
In general, these techniques are capable of grouping
edge points into extended chains. However, the goal of computing complete
bounding contours has proven to be more elusive. Although approaches using
global grouping cues such as convexity
( Jacobs, 1996;
Huttenlocher & Wayner, 1992) and
closure ( Elder & Zucker, 1996a;
Mahamud, Thornber, & Williams, 1999)
have yielded limited success, the general problem of computing the complete
bounding contour of an object of arbitrary shape in a complex natural image
remains essentially unsolved. One possible way of improving probabilistic models
for contour grouping is to ground the models in the actual statistics of
grouping cues for natural image
contours. 1.3 Statistics of Contour Grouping
The connection between human visual processing and the
statistics of the natural visual world has been recognized for nearly 50 years.
Attneave (1954) and
Barlow (1961) suggested that early visual
processing may be designed primarily to reduce the statistical redundancy in
natural images, thereby increasing coding efficiency. This idea has led to
demonstrations that simple principles of redundancy reduction and sparseness of
coding predict early visual filters that are qualitatively similar to the
receptive field properties of early visual neurons
( Srinivisan, Laughlin, & Dubs, 1982;
Field, 1987;
Atick & Redlich, 1992;
Olshausen & Field, 1996;
Olshausen & Field, 1997).
4
The original proposals of
Attneave (1954) and
Barlow (1961) have led to considerable
progress in our understanding of early visual processing in terms of linear
transformations that reduce the redundancy or increase the sparseness or
statistical independence of neural responses. However, there have been
relatively few attempts to understand higher-level visual problems, such as
perceptual organization, in terms of the statistics of the natural visual world.
This is somewhat surprising because just prior to Attneave and Barlow’s
proposals, Brunswik and Kamiya (1953)
proposed that the classical Gestalt principles of perceptual organization should
be quantitatively related to the statistics of the natural world, and presented
some data on the proximity of parallel contours in natural images. However,
their suggestion remained largely untested until 1998, when
Kruger (1998) first reported data on the
second-order spatial statistics of Gabor filter responses to natural images, and
we first reported the results of this study
( Elder & Goldberg, 1998a;
1998b). More recently, there has been an
interesting study of natural image statistics relevant to the problem of image
segmentation
( Martin, Fowlkes, Tal, & Malik, 2001),
and two studies of natural image statistics relevant to the perceptual
organization of contours
( Geisler, Perry, Super, & Gallogly, 2001;
Sigman, Cecchi, Gilbert, & Magnasco, 2001).
We discuss the studies relevant to contours in more detail below.
Kruger (1998)
examined the second-order “co-occurrence” spatial statistics of
Gabor filter responses to natural images. Filter responses were nonlinearly
normalized and thresholded prior to statistical analysis. Kruger found
statistical evidence for colinearity and parallelism relations in these
second-order spatial statistics.
Sigman et al. (2001)
also examined the co-occurrence statistics of oriented filter responses to
natural images. They reported long-range correlations that adhered to the
geometric principle of cocircularity. In a related study,
Geisler et al. (2001) measured the
co-occurrence statistics of oriented edge elements in natural images, and
related these to human performance in detecting sampled contours in cluttered
displays. They proposed a simple model for grouping based in part on these
statistics, and found that their model was to some degree consistent with the
psychophysical data.
The correlations in the joint statistics of oriented
edge elements observed in these studies reveal interesting parallel, colinear,
and cocircular structure. Characterization of this statistical structure could
be useful for understanding key aspects of early visual processing. Just as
earlier statistical studies predicted early visual filters that are
qualitatively similar to the receptive field properties of early visual neurons,
these later studies may help us to relate natural image statistics to more
complex aspects of neural coding, including the spatial nonlinearities in
complex cells and lateral interactions between neurons in early visual cortex
( Gilbert & Wiesel, 1989). An
understanding of this second-order statistical structure is also useful for
image processing applications, including image compression and image denoising
( Simoncelli, 1997).
However, we will argue that these statistics are not
sufficient to understand the statistical basis for the perceptual organization
of contours. The heart of the matter is that perceptual grouping is not simply
the problem of detecting correlations. Rather, the problem is to integrate the
sequences of elements that project from common structures in the scene. In
particular, contour grouping is the problem of integrating those edge elements
that lie on a common luminance boundary. There are other reasons one might
predict statistical correlations between edge elements: between parallel
elements in texture flows, for example, or between colinear elements on
different components of a regular texture. Because the statistics reported in
these studies result from a mixture of these effects, they cannot be used
directly to understand contour grouping per se.
In our study, we use human observers to trace the
contours in natural images, and thus obtain pairs of contour elements that we
know should be directly grouped. At the same time, we randomly sample the image
to obtain pairs of tangent elements that should not be grouped. From the
perspective of probabilistic inference, it is vital to have statistics for both
(contour and random) events: it is the ratio of the likelihood distributions for
these two events that determines the posterior probability for contour grouping
( Section 4). Since these two events are not
distinguished in the co-occurrence statistics collected in other studies, these
statistics are insufficient for the probabilistic inference of contours.
In addition to collecting co-occurence statistics,
Geisler et al. (2001) used a form of
contour tracing in order to distinguish the statistics relating contour elements
on a common contour from those relating elements on different contours. (Our
study was first reported
[ Elder & Goldberg, 1998a;
1998b] two years before the first report
of the work of Geisler et al.
[ Geisler, Super, & Gallogly, 2000]).
However, their technique differs from ours in one crucial respect. Their traces
indicate which elements are perceived to lie on a common contour, but they do
not provide any information about the ordering of the elements along the
contour. This is important because a defining property of contours is their
one-dimensionality: a contour may be parameterized by a single real variable,
e.g.,  . This imposes
an ordering on the local elements of a curve. For example, the point
 on the curve
lies between points  and  if and only if
 . These
properties are essential for defining higher order properties of curves, e.g.,
curvature, closure, concavity, and convexity.
To maintain this one-dimensional characteristic in a
discrete encoding, a contour must be represented as an ordered sequence of local
elements. In the study of Geisler et al., contours are represented not as
ordered sequences but as unordered sets of oriented elements, and their
statistics relate arbitrary pairs of tangents on a contour. These statistics
therefore do not reflect the fundamental one-dimensional topological property of
contours.
In contrast, we define the problem of contour grouping
as the recovery of sequences of tangents projecting from the contours of a
scene. Participants trace sequences of tangents defining the contours they
perceive in natural images. From these data, we can derive statistics for
tangent pairs that are successive components of a common contour. These
statistics thus inform us about the cues to inferring the sequence of tangents
defining perceived contours.
We model these sequences as Markov chains. The Markov
approximation captures the local nature of the physical processes that give rise
to these contours, and is consistent with the monotonically decreasing nature of
the autocorrelation function of natural images, the spatiotopic structure of
early visual cortex, and the well-studied psychophysical principle of proximity.
Application of the Markov approximation allows us to understand the problem of
contour grouping by characterizing the statistics of local grouping between
successive elements comprising a contour.
Our model reflects the fact that the statistical
dependencies between neighboring tangents on a contour are much stronger than
those between distant tangents on a contour. (Here the terms
“neighboring” and “distant” refer to ordinal distance in
the chain, not Cartesian distance in the image.) In the approach of
Geisler et al. (2001), the power of the
strong statistics relating neighboring tangents is diluted with the weak
statistics relating distant tangents. This leads to substantial differences
between our statistics and their statistics, as we shall see.
It should be noted that there are potential
disadvantages of contour-tracing methods. Compared with the measurement of
ordinary second-order (co-occurrence) statistics, tracing methods introduce
additional possible sources of error and bias. This may include biases of the
participants doing the tracing, and errors caused by the software that manages
the tracing process. A nice aspect of the Geisler et al. study is that they use
both methods, and compare the results of the two. Of course, because we expect
significant differences even without error, it is not possible to verify either
method with this comparison. Our view is that we have no real choice but to use
human-traced contours, because it is not possible to get the required statistics
for the Bayesian inference of contours without ground-truth data, and human
traces are the best available approximation to ground truth data.
Another distinction of our study is the multiplicity of
grouping cues explored, and the rigorous manner in which they are compared. It
has long been understood that perceptual organization is determined by the
simultaneous action of several factors or principles
( Wertheimer, 1923/1938;
Koffka, 1935). How are these different
factors combined? What is the relative importance of these factors in
determining the perceived organization? Other studies have investigated two
grouping factors (proximity and good continuation) but have not quantified the
relative importance or independence of each individually, and photometric cues
have been completely ignored.
Here we investigate three of the classical Gestalt
principles (proximity, good continuation, and luminance similarity) for the
organization of local curve elements into extended contours. We investigate
these properties separately so that we may estimate their relative inferential
power, but we also study to what degree they provide independent information for
contour grouping, and how they can be optimally combined.
2. Local Contour Representation
To measure the statistics of contour grouping cues in
natural images, we must first be able to detect and represent the local contour
elements. In other studies, edges have been detected using fixed-scale filters
followed by simple nonlinearities. For example,
Kruger (1998) used fixed-scale oriented
Gabor filters, followed by a point nonlinearity and thresholding.
Sigman et al. (2001) used fixed-scale
steerable quadrature-pair filters to measure the local oriented energy, followed
by a threshold. Geisler et al. (2001)
used a two-stage filtering process. In a first stage, potential edge locations
were identified as the zero-crossings in the response of a nonoriented log Gabor
function. The local energy at these locations was then measured using oriented
quadrature-pair log Gabor filters, and a threshold was applied.
Although the filters used in these edge detection
techniques all bear some resemblance to the receptive fields of early visual
neurons, they are likely to be gross oversimplifications of the cortical
processing involved in edge detection. Neurons in primary visual cortex are
extremely diverse in their receptive field properties, even within a single
class of cell. For example, foveal simple cells in primary visual cortex of
macaque range in peak spatial frequency from roughly 0.5 cycles per degree (cpd)
to more than 16 cpd, spatial frequency bandwidth from roughly 0.4 octaves to
more than 2.6 octaves, orientation bandwidth from less than 10 deg to more than
180 deg, and receptive field height:width ratios from roughly 1:1 to 16:1
( DeValois, Albrecht, & Thorell, 1982;
Parker & Hawken, 1988).
Many psychophysical studies using adaptation, masking,
and subthreshold summation techniques have demonstrated that early visual
processing results from the activity of multiple mechanisms with different
spatial frequency tunings
( Campbell & Robson, 1968;
Wilson & Bergen, 1979;
Watson, 1982;
Watt & Morgan, 1984;
Wilson & Gelb, 1984;
Watson, 2000). Recent work suggests that
psychophysical edge detection also requires mechanisms over a broad range of
orientation bandwidths, as is suggested by the physiological data
( Sachs & Elder, 2000).
In this work, we use a multi-scale edge detection
method developed for computer vision applications
( Elder & Zucker, 1998b). In
some respects, this method is more biologically plausible than the methods used
by Kruger (1998),
Sigman et al. (2001), and
Geisler et al. Most critically, filters
varying over a range of spatial frequencies (scales) are employed, similar to
the range found in early visual cortex of primate. The adaptive filter selection
method has been found to predict human visual acuity for blurred edges
( Elder & Zucker, 1996b) and human
detection efficiency for windowed edges in noise
( Sachs & Elder, 2000).
However, our goal here is not to propose a model for
human visual edge detection, nor to prefilter the images through a simple model
of early visual cortex. Rather, our goal is to reliably detect and locally
represent the contours projecting from luminance transitions in the scene. Only
if this is achieved will we be accurate in our estimates of contour statistics
in natural images. Further, if we hope to eventually measure the degree to which
the human visual system is tuned to the statistics of natural images, it is
vital that we do not corrupt our measurements of natural images with biases
induced by our simplistic models of cortical processing. Such a circular
procedure would undermine the significance of any links we may discover.
What leads us to choose the multi-scale edge detection
algorithm of Elder and Zucker (1998b) is
thus not its biological plausibility, but its performance in detecting edges
over the broad range of blur, contrast, and clutter found in natural images. The
multi-scale nature of the algorithm is crucial to achieving this.
One important advantage of this representation is that
we can invert it to reconstruct an approximation of the original image, and thus
can subjectively and objectively measure any information lost or distortions
introduced ( Elder, 1999). We have conducted
detailed studies to show that this representation is perceptually nearly
complete, with minimal loss of information.
Although our local edge computation yields an accurate
representation of the image, these edges do not provide an optimal basis for
contour grouping. Due primarily to spatial discretization, the edge map
representation of a contour is jagged and noisy. In order to avoid these
problems, we employ a second level of representation in which the contour is
locally represented by tangents with position, orientation, and length
represented by real numbers
( Elder & Zucker, 1996a). These two
stages are described in more detail below.
We model local edges as Gaussian-blurred step
discontinuities in image intensity ( Figure 1).
The model consists of 5 parameters
( Elder & Zucker, 1998b):
| • | Location
(to the nearest pixel in this implementation) |
| • | Orientation |
| • | Blur scale
|
| • | Asymptotic intensity on
the bright side of the edge
|
| • | Asymptotic intensity on
the dark side of the edge
 |
Figure 1.
Local Gaussian-blurred edge model and derivative-based detection/estimation
mechanisms.
Detection of edges and estimation of model parameters
are based on measurement of the gradient of the intensity function using
steerable first derivative of Gaussian filters
( Freeman & Adelson, 1991;
Perona, 1995), and on estimation of the
locations of zero-crossings and extrema of the second derivative using steerable
second derivative of Gaussian filters, steered in the gradient direction. While
the zero-crossing of the second derivative localizes the edge, the separation of
the 2nd derivative extrema in the gradient direction is used to estimate the
blur scale of the edge. Estimates of the image intensity at the 2nd derivative
extrema are used to estimate the mean intensity
 and the
magnitude of the intensity change  at the edge. These are then used to estimate the
asymptotic intensities  and  on either side of the edge
( Figure 1).
A major obstacle to reliable edge detection is the
scale problem: how to choose the scale of local estimation filters in order to
prevent false positives and distortion due to noise, while minimizing distortion
caused by neighboring image structure. Our method for edge detection solves this
problem with an adaptive scale space technique called local scale control
( Elder & Zucker, 1998b). This
technique selects, at each point in the image, the minimum reliable scale for
local estimation. At this scale, hypotheses concerning the sign of response of a
linear filter at each point can be tested with statistical reliability. This
means in turn that zero-crossings can be reliably detected and localized. This
theory of scale selection has been shown to accurately predict human
psychophysical performance in edge localization and blur estimation tasks
( Elder & Zucker, 1996b). An example of
the edge map produced by this algorithm is shown in
Figure 2b.
Because our interest is to estimate the properties of
grouping cues for the actual contours in an image, it is important that our
local elements (edges) be accurate. Otherwise we may simply be measuring
artifact of our edge detection methodology. Because we lack ground truth for the
actual location of contours in the image, a direct estimate of the accuracy of
our edges is not available. We must therefore consider indirect methods.
We have recently reported a method for inverting our
edge representation to compute an estimate of the original image from which the
edge map was computed ( Elder, 1999). Using
this algorithm, we have shown our edge representation to be both objectively and
subjectively accurate for a wide variety of images.
Figure 2c shows the reconstruction of the image in
Figure 2a from the computed edge representation.
Figure 2. a. Example contour traced by a human
participant. b. Edge map from which contours are defined. c. Reconstruction of
image from edge representation. d. Portion of tangent map. Each colored line
segment represents a distinct tangent.
Although our local edge computation yields an accurate
representation of the image, these edges do not provide an optimal basis for
contour grouping. The problem is illustrated in
Figure 3. Due primarily to spatial
discretization, the edge map representation of a contour is jagged and noisy.
Even when a set of edges are known to be generated by the same contour, it is
difficult to specify an appropriate ordering on the edge pixels, and tracing a
contour through any particular ordering yields a curve corrupted by high
curvature wiggles due to the
discretization. Figure 3. Observable geometric cues between edge
pixel tangents are dominated by artifact introduced by the spatial
discretization of the image.
2.2 Tangent Representation
In order to avoid these problems, we employ a second
level of representation in which the contour is locally represented by tangents
with position, orientation, and length represented by real numbers
( Elder & Zucker, 1996a). These
tangents, not constrained by the discrete pixel grid, and often averaging over
multiple, roughly colinear edges, provide a much more accurate basis for
one-dimensional perceptual grouping. We stress that these computations are not
intended as a model for biological visual processing. Rather, they are intended
simply to provide an accurate estimate of local contour information.
To construct the tangent representation, each local
edge in the image generates a tangent line passing through the edge pixel in the
estimated tangent direction. The tangent estimates that are 8-connected to the
local edge, which lie within an  -neighbourhood of the local tangent line, and
whose gradient directions are compatible with that of the local edge, are
identified with the extended tangent model. For this study, we use
 pixels.
Gradient direction compatibility is determined based on the known level of
sensor noise, using a first-order noise propagation model.
A greedy algorithm is used to select the subset of
tangents that will represent the image contours. Given a connected set of local
edges, the longest line segment that faithfully models a subset of these is
determined. This subset is then subtracted from the original set. This process
is repeated for the connected subsets thus created until all local edges have
been modeled. Luminance estimates for the edge pixels modeled by each tangent
are averaged, and each tangent is thus represented as a 6-element
vector: 5
| • | x
position |
| • | y position |
| • | Length |
| • | Orientation |
| • | Luminance on light side
of tangent |
| • | Luminance on dark side
of tangent |
By convention, the spatial component (first four
elements) of each tangent vector is represented as a 90 deg counterclockwise
rotation from the gradient direction. The (x,y) position represents the location
of the base of the vector in the image. A portion of the tangent map computed
for the image in Figure 2a is shown in
Figure 2d.
3. A Probabilistic Model for Tangent Grouping
The set of tangents
 computed from
an image may be enumerated:  . The set
 of possible
contours may then be represented as tangent sequences: A sequence of tangents
 if and only if
 | (1) |
This definition restricts the mapping to be injective
(tangents cannot be repeated in the sequence), with the exception that the first
and last tangent may be the same. In this case, the contour is closed. For the
purposes of this study, we will restrict our attention to contours for which
contrast polarity does not reverse along the contour, thus tangents in a contour
are linked “tip-to-tail.”
We assume that there exists a correct organization of
the image  . Correctness
may be defined in terms of objective ground truth, e.g., the contours that bound
the objects in a scene. Unfortunately, except perhaps for highly simplified
artificial or synthetic scenes, objective ground truth is difficult to obtain.
Because our interest is in the perceptual organization of typical natural
images, we elect in this study to define correctness in terms of human
perception, i.e., a contour is correct if it is what a human observer perceives.
If this aspect of human perception is close to veridical, then our study reveals
aspects of how contours in the natural world appear in images. If not, we can at
least say that our measurements reveal aspects of the information likely used by
the human visual system to group contours.
A visual system may use a number of observable
properties  to decide on
the correctness of a hypothesized contour. Here we examine properties
corresponding to the classical Gestalt cues of proximity, good continuation, and
similarity. Knowing these properties  influences the probability
 that a
particular contour  is correct.
Here we are interested in how properties
 defined on
pairs of sequential tangents  may influence the probability that a contour
 is correct. The
local property  may, e.g.,
represent the distance between the two tangents or a measure of the curvature of
the best continuant between the two tangents. Note that these local properties
do not embody many important aspects of global geometry and topology. In
general, the visual system may also apply one or more global constraints that
only a subset of contours may satisfy, e.g., closure, simplicity (no
self-intersections), and completeness. However, here we focus only on
characterizing the statistics of local cues for grouping.
Using Bayes’ theorem, the probability
 that a
particular contour c is correct may be written
as  | (2) |
where  | (3) |
In general, the prior ratio reflects the expected
number of tangents in the contours of the image and can be modeled fairly
easily. However, because contours can be many tangents in length, the likelihood
distributions are in general of very high dimension; in order to model the
statistics of contour grouping, some simplifying approximations must be made.
Here we model contours as Markov chains
( Mumford, 1992;
Elder & Zucker, 1996a;
Williams & Jacobs, 1997), so that
tangent grouping is pair-wise independent. In particular, we will assume that
only the grouping cues  directly relating tangents on the hypothesized
contour c depend upon the hypothesis, and that these are conditionally
independent.
Then  | (4) |
and the likelihood ratio can be computed as a
product of local likelihood ratios. Note that this model makes no assumption
that local tangent groupings are unique. The
intuition behind this Markov approximation is that the strongest statistics lie
in the relations between directly successive tangents on the contour, so these
should be modeled directly. The weaker statistics relating more distant tangents
are captured approximately through the Markov structure.
Depending on the nature of the global constraints, it
may be possible to compute maximally probable contours using an efficient
shortest-path computation on a directed graph representing the Markov network.
In prior work ( Elder & Zucker, 1996a),
we developed an algorithm for computing closed contours using these
approximations. While in the present work we are interested principally in the
problem of estimating local probability distributions, we will employ
interactive grouping software that makes use of these approximations in order to
rapidly infer contour segments between tangents selected by human observers
( Section 5).
Here we focus on local statistics, and so in the
following we will consider contours consisting of just two tangents
 . Then we have
 | (5) |
where  | (6) |
The prior ratio
 is
approximately equal to the probability that two arbitrarily selected tangents
are grouped. If we assume that pair-wise groupings are typically (but not
always) unique, then  is approximately equal to the reciprocal of the
number of tangents in the image.
The likelihood ratio
 represents the
ratio of the likelihood of the observables given that
 and
 are directly
grouped to the likelihood given that they are not. We will refer to these
likelihoods throughout the paper as the
contour and
random likelihoods,
respectively. Figure 4.
Observable data relating two tangents. See text for details.
In this study we consider three observable cues that we
expect to be most influential on the probability of grouping: proximity, good
continuation, and similarity. As a first order approximation, we use a
rectilinear model of completion between two tangents
 (Figure 4):
| • | Proximity:
A function of the length
of the
straight-line interpolant (gap). |
| • | Good
continuation: A function of the
two orientation changes
and
induced by
the interpolation. |
| • | Similarity:
A function of the differences in estimated image intensities
and
between
the two tangents. In this study, we consider only grouping that preserves the
contrast polarity of the contour. |
If we can approximate distinct cues (proximity, good
continuation, etc...) as independent when conditioned upon grouping hypotheses,
the contour and random likelihoods can be
factored. 6 Given
 distinct cues
 relating
tangents  and
 , we then
have:  | (7) |
It is these likelihoods that we wish to
estimate in the present
study.
Five unpaid participants, all undergraduate or graduate
students of vision science, participated in the experiment. All had normal or
corrected-to-normal vision. The participants were aware of the goals of the
study.
Experiments were conducted on a Pentium workstation
with a Sony Trinitron display. Proprietary software, discussed in detail in
Section 5.4, was employed to display the
images and allow participants to trace perceived contours.
Nine arbitrarily selected natural grayscale images were
employed ( Figure 5). An attempt was made to include images
of diverse subjects and settings (e.g., people, objects, animals; indoor,
outdoor). Figure 5. Images
used for our experiments.
5.4 Software Tool for Interactive Contour Grouping
Our goal was to estimate the probability distributions
for the observable grouping cues available in the contours perceived by human
observers. To do this, we needed a method for translating observer percepts into
tangent sequences: somehow observers must be able to trace the contours they
see, and each trace must be mapped to a sequence of tangents.
In order to allow participants to accurately and
efficiently trace contours, we employed a software package called Interactive
Contour Editor (ICE), previously developed for a demonstration of contour-based
image editing technology
( Elder & Goldberg, 2001). ICE
represents an image by information at its edges, and then allows the image to be
modified by direct editing of the contours. This technology uses previously
developed algorithms for reconstructing images from our edge representation
( Elder, 1999). In order to allow users to
efficiently manipulate contours, ICE provides an interactive contour grouping
mechanism based on the tangent representation and independence approximations
described in Sections 3 and 4
( Elder & Zucker, 1996a). The
likelihood distributions employed are generally Gaussian, with parameters chosen
using a combination of common sense and trial and
error. 7
Rather than requiring experimental participants to
painstakingly trace each tangent of a contour in sequence, we used the grouping
feature of ICE as a kind of “power-assist” to accelerate the
process. This approach has a number of advantages:
| • | Accurate
estimation of probability distributions requires a large amount of data. Using
ICE, participants can group a long sequence of tangents with a relatively small
number of mouse clicks, allowing the required quantity of data to be collected
quickly. |
| • | The increase in
efficiency reduces observer fatigue, and thus may improve the quality of data.
|
| • | ICE turns approximately
positioned mouse clicks into selections of the nearest tangent, and allows the
observer to group contours in chunks. These capabilities eliminate the need for
zooming and unzooming the image, which can cause the observer to lose global
perspective and can introduce errors into the data. |
A potential disadvantage of the methodology is that ICE
may itself introduce errors into the data by selecting groupings that are not
perceived by the observer. This problem was largely avoided by provision of an
“undo” mechanism that allowed participants to delete groupings they
had not intended to make. Participants were instructed to use the ICE grouping
tool to full advantage, but to be constantly vigilant for such errors and to
correct them immediately. Typically the errors made by ICE were
“blunders” that were difficult to miss.
The graphical user interface (GUI) for ICE is shown in
Figure 6c. Both the working image and edge map
are displayed. For this study, we made use only of the features of ICE that
allow contour tangents in a natural image to be selected and grouped as a
sequence.
Figure 6. a. Example of incorrect
path computed by ICE. b. The correct path is computed when the participant
selects more closely spaced tangents. c. Graphical user interface for ICE.
Participants select contours by clicking on either the
image or the edge map. Grouping is initiated by clicking near a contour. This
click initiates a nearest neighbor search in the area of the mouse click to find
the nearest edge point. The coordinates of the nearest edge point are used to
index the tangent map and thus obtain the index of the tangent corresponding to
the edge point. The selected tangent is highlighted in color on both edge and
image displays. When the user clicks near a second edge point, a terminating
tangent index is similarly obtained.
These two tangent indices form input to a graph
algorithm that determines the most probable sequence of tangents connecting the
two selected tangents, under the independence approximations discussed in
Sections 3 and 4
( Elder & Zucker, 1996a).
In append path mode, subsequent mouse selections will
append maximum likelihood contour segments to the previously computed path. In
replace path mode, a third mouse selection will deselect the previous path and
begin a new path at the selected edge point.
Figure 6 shows an
example of this interactive grouping procedure. Selected tangents are indicated
by bow tie markers. Because the grouping algorithm is imperfect, selecting two
tangents that are too distant may lead to a nonsense path
( Figure 6a). In such cases, the participant may
undo the path and, with ICE in append path mode, select a sequence of more
closely spaced points along the contour that the algorithm can more easily
connect ( Figure 6b).
Each participant was instructed to use the ICE software
to trace all of the contours they perceived in each of the natural images.
Images were presented in a random order. Participants could select tangents by
clicking in either the image or the edge map. Participants were instructed to
try to group complete contours, but not to group multiple contours together.
Participants were also instructed to consider not only the contours bounding
objects, but also contours arising from reflectance changes, shading and
shadows.
Unlike
Geisler et al. (2001), we did not force
the participants to trace all automatically detected contours in the image. Thus
the potential exists that participants may have traced only the more salient
contours, even though they were instructed to trace all contours they perceived,
and this may lead to bias in the statistics. The difficulty in forcing the
participants to trace all detected contours is that, depending upon the
characteristics of the monitor, there may be some contours they simply cannot
see due to blur, low contrast, and clutter. Geisler et al. get around this by
imposing an arbitrary response threshold on edge detection filters and thus
suppressing low-contrast edges, but, of course, this could also introduce bias
in the contours that are traced.
The experiment produced a total of 16,222 tangent pairs
perceived to be directly grouped. Many of these tangent pairs were selected by
more than one participant: thus only 7,476 of these pairs were unique. We
considered the set of tangent pair samples selected by each observer to be an
independent random sample from a common underlying population, and therefore
used the full set of 16,222 pairs in estimating the contour distributions.
In order to estimate the random likelihood
distributions, we randomly sampled 10,000 pairs of tangents from the same set of
9 images. No attempt was made to avoid tangent pairs that are perceived to be
grouped, because such pairs form an insignificant proportion of the total number
of tangent pairs present in an image. 8
5.6 Modeling of Distributions
As a first-order approximation, we will assume that the
grouping cues are mutually independent in the contour and random conditions,
i.e., when conditioned upon  or  . Thus we are interested in modeling the
individual marginal distributions for each of the cues.
We wish to estimate the likelihood distributions for
each of the Gestalt cues  , given tangents that are successive elements of
the same contour (  ), and for
tangents that are not (  ). In this way we hope to quantify our
understanding of the classical Gestalt laws. For example, intuitively we expect
that the distance between two tangents known to be successive elements of the
same underlying contour will tend to be smaller than the distance between random
tangents, but we are seeking a more complete quantitative description of this
intuition in the form of these two likelihood distributions.
Figure 7a shows a
log-log plot of the contour likelihood distribution
 , where
 is the
separation between tangents. A scatterplot of the empirical distribution is
shown in blue. Note that while we would expect the true likelihood distribution
to decrease monotonically as a function of the distance between tangents, the
data are nonmonotonic, peaking at roughly
 pixels. We
believe the falloff observed for small gaps is because of small random errors in
our algorithm’s localization of tangent endpoints; we discuss this
below.
For gaps greater than 2 pixels, the data appear roughly
linear in log-log coordinates. In other words, the contour likelihood
distribution for the proximity cue follows a power law:
 | (8) |
Figure 7. a. Estimated likelihood
distribution  for the
proximity cue between two tangents known to be successive components of a common
contour. b. Sample standard deviation of tangent separation along contours, as a
function of sample size. The lack of convergence suggests that the standard
deviation is undefined. c. Estimated likelihood distribution
 for the
proximity cue between two tangents selected randomly from the image. d.
Practical (with estimation noise) and theoretical (without estimation noise)
estimates of the posterior distribution
 for
tangent grouping based on the proximity cue.
A maximum likelihood estimate of the underlying power
law is shown in magenta in Figure 7a.
Bootstrapping to estimate standard errors, we estimate the power law parameters
to be: Thus the gaps along a contour
follow a power law, with a minimum distance between tangent endpoints of 1.4
pixels (roughly the distance between diagonally adjacent pixels), and an
exponent of 2.92. While the mean separation is 2.9 pixels, the standard
deviation and higher order moments are undefined: for example, the model
predicts that the sample standard deviation of the distance between tangents
along a contour, estimated from observed data, will increase unbounded as a
function of the size of the sample. Figure 7b
shows that our data do indeed exhibit this behavior.
To model the complete distribution of the proximity
law, including small gaps, we assume that this power law is corrupted by noise
caused by small errors in localizing tangent endpoints. We have observed that
these localization errors can be as large as
 pixel in both
horizontal and vertical directions. Modeling these random errors as independent
and uniformly distributed, we can generate samples of the resulting noisy power
law. Such a sample is shown in green in
Figure 7a: the striking similarity between the
real and simulated data provides strong support for the model.
The random likelihood distribution
 for the
proximity cue is shown in Figure 7c. To model
this distribution, we assumed that tangents are uniformly distributed over the
image. We then computed the exact distribution based on this assumption for a
 pixel image.
Distributions for square images of different sizes are obtained by simply
scaling this distribution. Although our images were of various sizes and
generally were not square, for the purposes of this study we approximated the
images as  pixels. The
resulting model is seen to fit the data well.
Having models of the likelihood distributions for both
contour and random conditions, and knowing the number of tangents in each image,
we can use Equation 1 to compute the posterior probability
 as a function
of the tangent separation. Figure 7d shows the
posterior with and without the noise introduced by the tangent computation. It
can be seen that for small separations, the grouping probability is very high.
Figure 8. Coding the good continuation cue. All data are
drawn from the contour condition. a. Scatterplot showing negative correlation of
the two interpolation angles. b. Linear recoding into parallelism and
cocircularity cues results in a more independent code. c. The parallelism cue is
approximately uncorrelated with the proximity cue. d. The cocircularity cue is
correlated with the proximity cue. See text for details.
Figure 9.
Recoding the interpolation angles into a sum and difference leads to an
intuitive representation of good continuation.
6.2 Good Continuation Cue
Using our first-order model of contour continuation
(Figure 4), the grouping of two tangents generates two interpolation angles
 . We find that
these two angles are strongly anticorrelated in the contour condition
( Figure 8a). This suggests a recoding of the
angles into sum (  ) and difference
(  ) cues. This
encoding appears to be close to the principal component basis for the good
continuation cue: the new variables are approximately uncorrelated in the
contour condition ( Figure 8b).
There are four advantages to this new representation of
the good continuation cue. First, it appears to be close to the principal
components of the data in the contour condition. Second, when less than 180 deg
in absolute value, these two new variables have very natural meaning, in terms
of intuition and in terms of the literature
( Figure 9). The sum variable represents
parallelism: the two tangents are parallel if and only if
 , and
 increases
monotonically in absolute value as the tangents become less parallel. The
difference variable represents cocircularity: the two tangents are cocircular if
and only if  , and
 increases
monotonically in absolute value as the tangents become less cocircular. Finally,
note that two tangents are colinear if and only if both
 and
 . Because for
roughly 94% of our data for the contour condition both variables are less than
180 deg in absolute value, we will generally refer to
 and
 as parallelism
and cocircularity cues, respectively.
In the 6% of cases where either
 or
 is greater than
180 deg in absolute value, we cannot think of them as measuring parallelism or
cocircularity. This is because neither geometric property embodies the sense in
which the contour is being traversed. Clearly this is an important constraint in
contour grouping, and the  and  variables do take this into account.
This representation of good continuation is different
from that used by Geisler et al. (2001).
In their representation, while one angle represents parallelism, the other angle
has no obvious intuitive meaning, and we find in our own data that these two
angles are highly correlated.
A third advantage of the parallelism and cocircularity
encoding of the good continuation cue is that sources of error in measuring
these two variables are quite different, and it useful to separate these out. We
discuss this at length below. The final advantage is that the parallelism cue
will turn out to be much stronger in inferential power than the cocircularity
cue ( Section 7), and this supports our
general goal of constructing representations that concentrate the greatest
predictive power in the smallest number of variables.
The parallelism cue (as represented by its standard
deviation) is also very nearly uncorrelated with the proximity cue
( Figure 8c). It is thus appropriate to consider
the marginal statistics of the parallelism cue.
Figure 10a shows the likelihood distribution
for the parallelism cue in the contour condition (blue curve). The distribution
is kurtotic (kurtosis = 16.9). To model this distribution, we employed a
generalized Laplacian distribution that has been used in the past to model
kurtotic wavelet response histograms
( Mallat, 1989;
Simoncelli & Adelson, 1996; Simoncelli, 1999): 9
 | (9) |
This distribution is symmetric and unimodal.
 is the standard
deviation and  controls the
kurtosis. If  the
distribution is Gaussian. If  the distribution has positive kurtosis. If
 , the
distribution has negative kurtosis, approaching a uniform distribution as
 . To model an
empirical distribution, we determine the generalized Laplacian distribution with
matching standard deviation and kurtosis. Given a target kurtosis, the required
 is found using
standard nonlinear optimization techniques.
The generalized Laplacian model for the parallelism cue
is shown in red in Figure 10a. The model
parameters for the parallelism cue in the contour condition are listed in
Table 1.
Figure 10. Statistical distributions for the good
continuation cues. a. Likelihood distribution
 for the
parallelism cue in the contour condition. b. Likelihood distribution
 for the
cocircularity cue in the contour condition. c. Likelihood distribution for the
good continuation cues
 and
 in the
random condition. d. Posterior distributions
 and
 for the
good continuation cues. See text for details.
Table 1. Generalized
Laplacian Parameters for Parallelism and Cocircularity Cues in the Contour
Condition
|
|
Kurtosis
|
|
|
Parallelism
|
42.1 deg
|
16.9
|
0.54
|
|
Cocircularity
|
76.8 deg
|
3.86
|
0.91
|
Although the parallelism cue is approximately
uncorrelated with the proximity cue, this is not the case for the cocircularity
cue ( Figure 8d): the standard deviation of the
cocircularity cue decreases as the distance between tangents increases. In other
words, the cocircularity cue is weaker for more proximal tangents.
We suspected that this observation stems from
measurement error. While the parallelism cue depends only on the difference in
estimated orientation of the two tangents, the cocircularity cue depends on the
orientation  of the virtual
line connecting the relevant endpoints of the tangents
( Figure 11). Denoting horizontal and vertical separations
of the two tangents as  and  respectively, we have
 , and the
partial derivatives of  with respect to
 and
 are
 | (10) |
Thus estimation of the cocircularity cue is ill
conditioned when the separation between tangents is
small. To minimize this source of error, we
restricted our analysis of the cocircularity cue in the contour condition for
tangent separations of 5 pixels or greater, where this small-separation effect
is negligible. The data and generalized Laplacian model are shown in
Figure 10b. The parameters of the model are
listed in
Table 1. Figure 11. The parallelism cue (sum of
interpolation angles) can be reexpressed as the difference in orientation of the
two tangents. The cocircularity cue, however, depends on the orientation
 of the
straight-line interpolant between the tangents.
The generalized Laplacian model can be seen to
overestimate the likelihood for nearly parallel tangents
( Figure 10a). We believe that a more accurate
model may be obtained by modeling the noise in tangent orientation estimation.
The effect of independent additive Gaussian noise of standard deviation
 in the two
tangent angles is to blur the likelihood distribution with a Gaussian blur
kernel of scale  . We therefore
estimate the standard deviation  of the measurement noise by minimizing the
least-squared difference between the data and Gaussian-blurred model for the
parallelism and cocircularity cues, obtaining an estimate of
 . The resulting
models are shown in green in
Figure 10a and 10b.
To determine whether localization error in tangent
endpoints could account for the observed correlation between proximity and
cocircularity cues, we used our model for endpoint error (uniform noise of
 pixel in
 and
 coordinates,
Section 6.1) and our model for the
cocircularity cue for tangent separations greater than 5 pixels to simulate
cocircularity data for a range of tangent separations. The result, shown in red
in Figure 10d, is quite consistent with the
observed data, suggesting that were localization error eliminated, the
cocircularity cue would be roughly uncorrelated with the proximity cue.
The likelihood distributions for the good continuation
cues in the random condition can be modeled by assuming an isotropic tangent
distribution. The resulting model can be seen to fit the data well
( Figure 10c).
Figure 10d shows the posterior distributions
for the two cues. In deriving these distributions, we have attempted to remove
errors in estimating tangent orientation and location. These distributions thus
represent a “best case” scenario. Note that the parallelism cue
appears to be more informative than the cocircularity cue: this will be studied
more formally in Section 7.
The tangent representation includes an estimate of the
image intensity on either side of each tangent. Differences in these intensities
between tangents form a potential cue for contour grouping.
Tangents may be grouped in two ways, so that the
polarity of contrast is either preserved or reversed along the contour. We did
permit contrast reversals in the contours traced by our participants, and found
that roughly 13% of the local groupings involved a contrast reversal. However,
on examination it appeared that a number of these contrast reversals were
erroneous. The difficulty was that our tracing software did not clearly indicate
a contrast reversal to participants, who therefore had no way to detect and
correct erroneous reversals. We therefore decided to restrict our analysis to
segments of contour where no reversals were indicated.
One obvious way of encoding intensity similarity
information is to consider the difference
 in the
intensity of the light sides of the two tangents
 as one cue, and
the difference  in the
intensity of the dark sides of the two tangents as a second cue.
The problem with this approach is that these two cues
are highly correlated in the random condition
( Figure 12a) and therefore their joint
distribution cannot be accurately approximated by the product of the marginal
distributions. By inspection it appears that the first principal component of
this joint distribution is roughly the sum of these two differences. This forms
a brightness cue  , measuring the
difference between the two tangents  in the mean luminance of the dark and light
sides of the underlying edge. The second principal component then forms a
contrast cue  , measuring the
difference in the amplitudes of the intensity steps at the two tangents. Using
this new basis will result in approximate decorrelation of the cues for the
nongrouped case
( Figure 12b).
Figure 12. a. Dark and light
luminance difference cues between randomly selected tangents are strongly
correlated. b. Brightness and contrast cues for randomly selected tangents are
approximately uncorrelated.
Table 2 lists the
Pearson correlations for these various luminance cues in both the grouped and
random conditions. Overall, the brightness and contrast cues are less correlated
than the light/dark difference cues. However, we must be careful not to assume
that these low correlations mean that the cues are independent.
Table 3 lists the Pearson correlations for
the absolute values of these same cues. Note the high correlation between
brightness and contrast cues in the contour condition. Clearly, it would be a
mistake to conclude that these two cues are independent.
These results present us with a dilemma. While the
dark/light representation is superior for the contour condition, the
brightness/contrast representation is superior for the random condition. One
solution is to use different representations for the two conditions; however,
then it would be impossible to quantify the inferential power of the individual
cues: the most we could do is quantify the power of the two luminance cues taken
together. Table 2.
Comparison of Pearson Correlation Coefficients for Two Measures of Luminance
Similarity
|
Dark/light
|
Brightness/contrast
|
|
Contour
|
0.12
|
-0.06
|
|
Random
|
0.77
|
0.01
|
Table 3. Comparison of
Pearson Correlation Coefficients for Absolute Values of Two Measures of
Luminance Similarity
|
Dark/light
|
Brightness/contrast
|
|
Contour
|
0.19
|
0.53
|
|
Random
|
0.65
|
0.02
|
Because one of the prime purposes of this study is to
quantify the inferential power of contour grouping cues, we elect instead to use
the brightness/contrast representation. As we shall see, the brightness cue
turns out to be a far more powerful cue than the contrast cue, and so this
decomposition allows the reduction of the luminance information into a single
cue without substantial loss of inferential power.
We found that the contrast of our images varied
considerably (standard deviations from 48 to 77 grey levels), and that the
statistics of luminance grouping cues co-vary with image contrast (Pearson
correlations of 0.72 for the brightness cue and 0.34 for the contrast cue:
Figure 13). In order to increase the reliability, and
therefore the inferential power of the luminance cues, it is useful to normalize
by overall image contrast. Figure 13. The statistics of both brightness and
contrast cues are strongly correlated with the pixel statistics of the
image.
|
|
|
|
|
|
|
|
Figure 14. a and b.Estimated likelihood distributions
 and
 for the
brightness and contrast cues between two tangents known to be successive
components of a common contour. c and d. Estimated likelihood distributions
 and
 for the
brightness and contrast cues between two tangents selected randomly from the
image. e. Posterior distributions
 and
 for
tangent grouping based on the brightness and contrast cues.
|
Figures 14a and 14b
show the contour likelihood distributions
 and
 for the
normalized brightness and contrast cues. Both are well modeled by generalized
Laplacian distributions. Figures 14c and 14d
show the random likelihood distributions
 and
 for these two
luminance cues. Unfortunately, the generalized Laplacian models these data less
accurately, failing to completely reflect the sharp peaks observed in the data
near zero. This result may reflect long-range spatial correlations in intensity
values, so that tangents on the same object typically generate much lower
luminance cue values than tangents from different objects. The generalized
Laplacian parameters for these models are provided in
Table 4. Table 4.
Generalized Laplacian Parameters for Similarity Cue Likelihoods
|
|
Kurtosis
|
|
|
Normalized brightness cue likelihood (contour)
|
0.32
|
4.40
|
0.87
|
|
Normalized contrast cue likelihood (contour)
|
0.59
|
3.3
|
0.96
|
|
Normalized brightness cue likelihood (random)
|
1.2
|
-0.03
|
2.0
|
|
Normalized contrast cue likelihood (random)
|
0.87
|
3.3
|
0.97
|
The derived posterior distribution models
 and
 for these two
cues are shown in Figure 14c. It is clear that
the brightness cue is a far more powerful cue than the contrast
cue.
We have assumed that the cues under study are mutually
independent when conditioned upon  or  . As a first step in checking this assumption, we
computed the Pearson correlation coefficients between the absolute values of the
cues in the contour condition ( Table 5). We
see from this calculation that correlations are relatively small (less than 0.1)
except for the brightness/contrast correlation (0.53). However, the relatively
weak inferential power of the contrast cue (see below) suggests that the
contrast cue could be omitted from models of contour grouping without
substantial loss in inferential
power. Table 5.
Pearson Correlations Between Grouping Principles
|
Grouping cues
|
Parallelism
|
Cocircularity
|
Brightness
|
Contrast
|
|
Proximity
|
0.01
|
-0.10
|
0.09
|
0.07 |
|
Parallelism
|
|
-0.05
|
| |