| Volume 4, Number 12, Article 1, Pages 967-992 |
doi:10.1167/4.12.1 |
http://journalofvision.org/4/12/1/ |
ISSN 1534-7362 |
Slant from texture and disparity cues: Optimal cue combination
James M. Hillis |
Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA |
|
Simon J. Watt |
Department of Psychology, University of Wales, Bangor, Wales, UK |
|
Michael S. Landy |
Department of Psychology & Center for Neural Science, New York University, New York, NY, USA |
|
Martin S. Banks |
Vision Science Program, Department of Psychology, & Wills Neuroscience Institute, University of California, Berkeley, CA, USA |
|
Abstract
How does the visual system combine information from different depth cues to estimate three-dimensional scene parameters? We tested a maximum-likelihood estimation (MLE) model of cue combination for perspective (texture) and binocular disparity cues to surface slant. By factoring the reliability of each cue into the combination process, MLE provides more reliable estimates of slant than would be available from either cue alone. We measured the reliability of each cue in isolation across a range of slants and distances using a slant-discrimination task. The reliability of the texture cue increases as |slant| increases and does not change with distance. The reliability of the disparity cue decreases as distance increases and varies with slant in a way that also depends on viewing distance. The trends in the single-cue data can be understood in terms of the information available in the retinal images and issues related to solving the binocular correspondence problem. To test the MLE model, we measured perceived slant of two-cue stimuli when disparity and texture were in conflict and the reliability of slant estimation when both cues were available. Results from the two-cue study indicate, consistent with the MLE model, that observers weight each cue according to its relative reliability: Disparity weight decreased as distance and |slant| increased. We also observed the expected improvement in slant estimation when both cues were available. With few discrepancies, our data indicate that observers combine cues in a statistically optimal fashion and thereby reduce the variance of slant estimates below that which could be achieved from either cue alone. These results are consistent with other studies that quantitatively examined the MLE model of cue combination. Thus, there is a growing empirical consensus that MLE provides a good quantitative account of cue combination and that sensory information is used in a manner that maximizes the precision of perceptual estimates.
 |
|
History
Received January 20, 2004; published December 1, 2004
Citation
Hillis, J. M., Watt, S. J., Landy, M. S., & Banks, M. S. (2004). Slant from texture and disparity cues: Optimal cue combination.
Journal of Vision, 4(12):1, 967-992,
http://journalofvision.org/4/12/1/,
doi:10.1167/4.12.1.
Keywords
depth perception, cue combination, stereopsis, Bayesian perception, texture gradient
for related articles by these authors
for papers that cite this paper |
The fundamental problem in depth perception is due to
the geometry of perspective projection, which reduces the three-dimensional (3D)
coordinates of the visual scene to the 2D coordinates of the retinal images. The
third dimension of space has to be inferred from the 2D images. The visual
system uses several sources of information—“depth cues” such
as disparity, perspective, and motion parallax—to estimate the layout of
the 3D scene. Estimates based on each individual cue are subject to error. By
combining information from several depth cues, the visual system could estimate
3D layout with greater precision across a wider variety of viewing situations
than it could by relying on any one cue alone. To realize this advantage, the
reliability of each depth cue must be factored into the combination rule.
Factoring in reliability is complicated because the reliability of individual
depth cues depends on scene parameters in different ways. Are variations in
depth cue reliability with scene geometry factored into the cue-combination
rule? To examine this question, we compared human slant discrimination ability
based on disparity and texture cues to a model of statistically optimal cue
combination. Slant estimation from texture and disparity is an interesting case
to examine because the reliabilities of disparity and texture cues vary in
different ways with slant and viewing distance. Knill and Saunders ( 2003) examined the combination of
texture and disparity as a function of slant with a similar approach to what we
present here. We have expanded their experiments to include surfaces slanted
about a vertical axis and surfaces at multiple viewing distances (see Discussion: Comparison to other studies). We
measured the reliability of slant estimates from each cue in isolation across a
range of slants and distances and used an optimal cue-combination rule to
predict the appearance of two-cue stimuli and the precision of slant estimation
with two-cue stimuli. We then compared these predictions to the results of
two-cue slant discrimination
experiments.
Visual estimates of slant from any depth cue are
subject to error. For example, perceived slant from a given texture gradient
will vary from one instance to another due to the statistical nature of slant
information from texture and errors in the measurement of the gradient (Blake,
Bülthoff, & Sheinberg, 1993;
Cutting & Millard, 1984; Knill,
1998a). When more than one depth cue is
available and informative, one can in principle reduce the uncertainty
associated with any one of the cues by combining across cues (for a review and
derivation of the following results, see Oruç, Maloney, & Landy, 2003).
One approach to optimizing cue combination is
statistical: What cue-combination rule results in an estimator that is unbiased
and has minimum variance? Assume that the observer has unbiased estimates
 and
 of the slant of
a surface based on disparity and texture cues, respectively. Assume further that
errors in these estimates are uncorrelated and have variances

and  . If we combine
the two estimates linearly, the rule that yields the minimum-variance, unbiased
estimate is a weighted average that satisfies (Cochran, 1937)
 | (1) |
where  | (2) |
and  and  are the
reliabilities of the two cues
(e.g.,  ). Furthermore,
if errors associated with the individual estimators are Gaussian, no other
(nonlinear) rule has lower variance. An
alternative approach is to apply Bayesian methods (for reviews, see Kersten,
Mamassian, & Yuille, 2004;
Mamassian, Landy, & Maloney, 2002). In the absence of any immediate
consequences to an observer's actions (payoffs and penalties), the maximum
a posteriori (MAP) estimate is typically employed. That is, the observer chooses
a slant estimate  that is most
probable given the image data. We assume the image data can be segregated into
those data  used to
estimate slant from disparity and  used to estimate slant from texture. Thus, we
choose the value of  that
maximizes  . Applying
Bayes' rule, and assuming that the two cues are conditionally independent,
we
derive  | (3) |
The first two terms on the right side of the
equation are the likelihood functions for each cue characterizing the
probability of observing the image data if
 is the actual
slant. The last term is the prior distribution, which is the probability of
observing  in the scene,
independent of the image data. If the likelihoods and prior are Gaussian, the
MAP estimate has the same form as the minimum variance, linear combination
estimate  | (4) |
where  | (5) |
Here,  and  are the maximum-likelihood estimates the
observer would have made from each cue in isolation (the mean of the respective
Gaussian distributions), and  is the mean of the prior. The
 are the
reliabilities of the respective distributions (likelihoods and prior). If the
prior has large variance relative to the individual cue likelihoods, Equations 4 and 5
reduce to Equations 1 and 2, which also yields the most likely slant to have
caused the current sensory data (i.e., it is the maximum-likelihood estimate or
MLE). For our conditions, the variance of the individual cues is much smaller
than the prior's variance (see Ideal observer
models in Discussion), so we will use Equations 1 and 2
throughout. By following the strategy described by
Equation 2, the variance of the weighted
average 
is  | (6) |
The variance of
 is lower than
the variance of either single-cue estimate. Many
investigations of sensory cue combination have shown that cue reliability is
taken into account in the estimation process (Backus & Banks, 1999; Banks, Hooge, & Backus, 2001; Battaglia, Jacobs, & Aslin, 2003; Buckley & Frisby, 1993; Frisby, Buckley, & Horsman, 1995; Jacobs, 1999; Körding & Wolpert, 2004; Rogers & Bradshaw, 1995; van Beers, Sittig, & Denier
van der Gon, 1998; van Beers, Wolpert,
& Haggard, 2002; Young, Landy,
& Maloney, 1993). Only five studies,
however, have tested the quantitative predictions of the MLE model expressed by
Equations 1, 2, and 6, to
determine if sensory cue combination is statistically optimal (Alais & Burr,
2004; Ernst & Banks, 2002; Gepshtein & Banks, 2003; Knill & Saunders, 2003; Landy & Kojima, 2001). These five studies measured
reliabilities of individual cues (  in Equations 2
and 6) and empirically tested predictions for
both the appearance and discrimination thresholds for stimuli when both cues
were present (provided by Equations 1, 2, and 6). All
five reported that the combination is quite close to the one predicted by those
equations. In these five experiments, the variances of estimates derived from
single cues were measured by conducting two-interval, forced-choice (2IFC)
discrimination experiments when only one cue was informative. For example, Ernst
and Banks ( 2002) conducted
size-discrimination experiments for vision alone and haptics alone and then fit
cumulative Gaussians to the two psychometric functions. The variance parameter
of the Gaussians provided estimates of the variances of the underlying visual
and haptic estimators. Equations 1, 2, and 6 were
then successfully used to predict the results of two-cue (visual-haptic)
experiments.
The experiments presented here used the strategy of
Ernst and Banks ( 2002) to ask whether
texture and disparity cues to slant are combined in a statistically optimal
fashion. The reliability of texture and disparity cues to slant vary with
viewing geometry in different ways. First, the reliability of texture should
increase with increasing slant because the image changes
associated with a given change in slant increase (Blake et al., 1993; Knill, 1998a). This relationship between reliability
and slant is reflected in human performance (Knill, 1998b). Theoretical and empirical analyses of
the reliability of disparity as a function of slant have not been conducted, but
it is unlikely that it changes significantly (Banks et al., 2001; Knill & Saunders, 2003). Second, because the magnitude of
binocular disparities for a given depth difference decreases as viewing distance
increases, the reliability of slant and curvature estimated from binocular
disparity should decrease as viewing distance is increased. Experiments confirm
that it does (Howard & Rogers, 2002; Ogle, 1950). On theoretical grounds, the reliability of
texture-specified slant, for a fixed retinal-image density, should not change
with distance. If a given textured surface is doubled in size and viewed from
twice the distance, the retinal image is unchanged. Thus, optimal
combination of disparity and texture cues to slant should involve complex
changes in the weights given to the two cues depending on base slant and viewing
distance.
We looked for evidence that the visual system weights
the two cues appropriately across a range of slants and distances. Because the
reliability of the texture cue increases with slant, we expect the texture
weight to increase as slant increases. Because the reliability of disparity
decreases as viewing distance increases, we expect the texture weight to
increase as distance increases. As in the previous studies, we determined the
reliability of the individual cues with 2IFC discrimination experiments. Then,
we measured the apparent slant and slant discrimination performance for two-cue
stimuli. As we shall see, the MLE cue-combination predictions based on the
single-cue experiments ( Equations 1, 2, and 6) were
largely in accord with the data from the two-cue
experiment.
Four observers participated. Two were not aware of the
experimental hypotheses (ACD and RM). All had normal stereopsis and did not
manifest eye misalignment in normal viewing
situations.
All stimuli were displayed on a custom-designed
stereoscope with two mirrors and two CRTs (one for each eye; see Backus, Banks,
van Ee, & Crowell, 1999). Each
mirror and CRT was attached to an arm that rotated about a vertical axis passing
through the eye's center of rotation. With this arrangement, the eye and
stereoscope arm rotate on a common axis, so when we change the vergence
distance, the mapping between the stimulus array and the retina is unaltered
(for fixed accommodation).
We used anti-aliasing to specify dot position to
subpixel accuracy. To ensure accurate reproduction of visual direction, we
spatially calibrated each CRT to eliminate distortions in the images (for
details, see Backus et al., 1999).
The observer's head position was stabilized using
a bite bar fastened to an adjustable mount. Each observer had a personal mount
so that the vertical axes of rotation of left and right eyes were collinear with
the rotation axes of the two stereoscope arms (for details, see Hillis &
Banks, 2001). The optical distance
between the center of rotation of each eye and the face of the CRT was 40
cm.
Stimuli were virtual planes slanted about a vertical
axis (i.e., tilt = 0 deg). We
independently manipulated two cues to slant: disparity and texture. In
single-cue measurements, we isolated one or the other of the two cues. In
two-cue measurements, both cues were informative, but could have different slant
values. Viewing distance was 19.1, 57.3, or 171.9 cm. Example stimuli are
shown in Figure 1.
Figure 1. Examples of the stimuli. Cross
fuse or divergently fuse to see the appropriate slants. The upper stimulus is an
example of the disparity-alone stimulus. It has a negative slant (right side
near). The lower row provides examples of the texture-alone stimulus when viewed
monocularly and the disparity-texture stimulus when viewed binocularly. The
disparity- and texture-specified slants are positive (right side far).
The texture cue was the perspective projection of
planar patches textured with Voronoi patterns with an average of 64 Voronoi
cells per patch (de Berg, van Kreveld, Overmars, & Schwarzkopf, 2000; Figure
1, bottom panel). The actual number of cells varied from trial to trial
depending on the randomly selected width of the patch (i.e., cells with a
constant average area filled the area of the elliptical patch). Voronoi patterns were generated from a jittered grid of dots. On a frontoparallel plane, a
regular grid of points was defined. Then, each point on the dot grid was
perturbed horizontally and vertically (uniform distribution from
–0.3 to 0.3 deg). The Voronoi
pattern defined by these points was then computed. Finally, the resulting
textured plane was rotated by an amount equal to the texture-defined slant. To
isolate the texture cue, the stimuli were viewed monocularly. The visible
portion of the plane was elliptical with a height of 15 deg. The width on each
presentation was randomly chosen from a uniform distribution from 15 to 20 deg
when the stimulus was frontoparallel. The stimulus was then rotated to the
appropriate slant. Thus, the retinal shape of the stimulus outline was an
unreliable cue to
slant. The disparity cue to slant was the difference between
left- and right-eye projections (calculated for each observer's
interpupillary distance). To isolate the disparity cue, the stimulus was defined
by sparse random dots ( Figure 1, top panel).
Each stimulus consisted of 64 dots, with positions randomly drawn from a uniform
distribution (note that the texture gradient specified by the dots was therefore
consistent with a frontoparallel plane). Dot density was
~0.3 dots/deg 2.
When both cues were present, disparity and texture
could be consistent (  ) or they could be in conflict. In the
no-conflict case, homogeneous Voronoi-textured surfaces were projected directly
to the two eyes. In cue-conflict cases, we first calculated a perspective
projection of the texture with slant  at the Cyclopean eye ( Figure 2, left panel). We then found the
intersections of rays through this Cyclopean projection with a surface patch at
the disparity-specified slant  ( Figure 2,
middle panel). The markings on this latter surface were then projected to the
left and right eyes to form the two monocular images ( Figure 2, right
panel).
Figure 2. Creation of the cue-conflict
stimuli. Left: Perspective projection of a homogeneously textured surface with
the Cyclopean eye as the center of projection. This projection creates the
texture-specified
slant,  . The rays
from the surface toward the eye are used in the next step. Middle: A virtual
surface with the disparity-specified slant,
 , is
created. The rays from the first step are back-projected from the Cyclopean eye
to find their intersections with the disparity-defined surface. They are marked
in the diagram with black points. Right: Viewing the black points binocularly
yields the cue-conflict stimulus containing the texture-specified slant in the
left panel and the disparity-specified slant in the middle panel.
Control experiments and procedures to
validate single-cue measurements
We went to some lengths to ensure that the single-cue
experiments measured the variances of the disparity and texture estimators in a
fashion appropriate for making two-cue predictions. In this section, we describe
control experiments and methodological procedures used to achieve that
goal. 1. Are disparity-alone measurements affected by monocular slant signals?
To make sure that only binocular information determined
slant discrimination in the disparity-alone case, we conducted two control experiments.
First, to make sure that the stimulus did not provide a
monocular cue to slant, we measured monocular slant-discrimination thresholds at
various slants for the 64-dot stimulus. Observers could not reliably
discriminate anything but large slant changes, and those changes were at least a
factor of 10 larger than the thresholds in the disparity-alone experiment. We
conclude that there is no useful monocular slant information in the 64-dot
random-dot stimulus.
Second, we wanted to make sure that we presented enough
dots in the display for disparity-based thresholds to be as low as possible
while still isolating the disparity estimator. The details of this control
experiment and the results are provided in Appendix
A. We found that threshold decreased as dot number increased from 2 to 32
and then leveled off beyond 32 dots. With 64 dots, disparity-based thresholds
were as low as they could be. The results were simpler for observer JMH than for
ACD: ACD may have given some weight to the texture signal at base slants
different from 0 deg. We will return to this point when we discuss her two-cue
data (in Discussion: Summary of
results). 2. Are disparity-alone measurements based on perceived slant or on only the disparity gradient?
To combine two cues for slant, the cues must be
promoted to the same units. Disparity signals alone do not provide a slant
estimate because they must be scaled or normalized for distance (Gårding,
Porrill, Mayhew, & Frisby, 1995).
We were concerned that observers might perform the slant-discrimination task in
the single-cue, disparity-alone case by comparing only the disparity gradients
in the two stimulus intervals. Said another way, they might perform the task
without normalizing the disparity signals into slant estimates. To test the MLE
model, we must acquire valid measures of the reliabilities of single-cue
slant estimates. In the disparity-alone
condition, this means our measure must reflect the process of scaling the
disparity signal into units of slant. If the task in the disparity-alone
condition were done without normalizing the disparity signal, the psychometric
data would not reflect errors introduced by the scaling process (which, within
the framework of weighted-linear cue combination, is essential for combining
disparity and texture signals), and we would underestimate the variance of
disparity-based slant estimates. We therefore looked for evidence that observers
scale the disparity signal for distance in a discrimination task with our
disparity-alone stimuli. Observers performed the slant-discrimination task with
the disparity-alone stimulus, but with the comparison stimuli appearing at
different distances relative to the standard stimulus. We found that most
observers (importantly, JMH and ACD) take distance into account when performing
the slant-discrimination task; that is, they do not perform the task by only
comparing the disparity gradients in the two stimulus intervals. The details of
the experiment and results are described in Appendix
B. The results of this control experiment support the assumption that the
disparity-alone measurements provide valid estimates of the reliability of the
disparity-based slant
estimator. 3. Are the estimates of the single-cue reliabilities valid for the two-cue experiment?
The single-cue data were used to specify the
model's parameters (single-cue variances) and the model was then used to
predict the two-cue data. An important assumption is that the appropriate
variances are being measured in the single-cue experiments. One might question
this assumption for the measurements of the disparity estimator's
reliability because a different type of stimulus was used in the single-cue,
disparity-alone condition (random-dot stimuli) than in the two-cue condition
(Voronoi stimuli). This concern cannot be addressed by using Voronoi stimuli in
the disparity-alone condition because such stimuli provide salient texture cues
to slant. We can, however, check the validity of using random-dot stimuli by
comparing single-cue thresholds with those stimuli to two-cue thresholds when
the texture weight is expected to be approximately zero. This check, described
in Results: Just-noticeable differences,
confirmed the validity of our assumption for observer JMH (the only observer for
whom the required data were available).
A similar concern can be raised about using monocular
stimuli in the texture-alone condition to measure texture reliability in the
two-cue experiment. The stimulus in the two-cue experiment is binocular, so the
visual system receives two samples while there is only one sample in the
texture-alone experiment. The two samples will not be the same because the
texture-specified slant at the left eye necessarily differs from the texture
slant at the right eye (see Appendix D).
Thus, the visual system must integrate two texture-gradient signals into one
binocular estimate before combining with the slant estimated from disparity. The
presence of two samples in the two-cue experiment might reduce the uncertainty
associated with the texture cue in a fashion similar to the reduction in
contrast threshold with binocular viewing (Legge, 1984). We can check this by comparing the
monocular single-cue thresholds to binocular two-cue thresholds when the texture
weight is expected to be approximately one. This check, described in Results: Just-noticeable differences, confirmed the
validity of our assumption that the monocular measurements were a valid estimate
of the variance of the texture estimator in the two-cue
experiment. 4. Do unmodeled slant cues affect responses?
We wanted to isolate the cues of disparity and texture,
so we had to consider whether other slant cues might be present in the display.
Three cues—the blur gradient, accommodation, and the phosphor grid of the
CRTs—always signaled a slant of 0 deg. If the observer failed to ignore
those conflicting cues, the variances we measured would be higher than the true
variances associated with disparity and texture. To reduce the salience of all
three cues, we placed diffusers on the faces of the CRTs to blur the stimuli
slightly. Blurring the stimuli decreases the probability that the observers used
the blur gradient because the blur gradient is a less reliable depth cue with
blurred as opposed to sharply focused stimuli (Mather & Smith, 2002). Blurring should also decrease the
probability that accommodation was used as a depth cue because humans
accommodate inaccurately if at all to blurred stimuli (Heath, 1956). The diffusers also made the phosphor grid
invisible. Procedure: Single-cue conditions
To estimate the
reliabilities of the texture- and disparity-based slant estimates, we obtained
psychometric functions for texture and disparity presented in isolation at
several base slants ( ±70,
±60,
±45,
±30,
±15, and 0 deg) and distances
(19.1, 57.3, and 171.9 cm) for two observers (ACD and JMH). The other two
observers participated in a subset of these conditions. We used a 2IFC task with
no feedback. On each trial, the observer indicated which of two
stimuli—one at the base slant and the other at the base slant
 —had the
greater apparent slant. The stimuli were displayed for 1.5 s with a 0.3-s
interstimulus interval. We used staircases to control the value of  and four
reversal rules—3-down/1-up, 1-down/3-up, 2-down/1-up, and
1-down/2-up—to sample points along the entire psychometric function. At
least eight staircases were employed for each psychometric function for ACD and
JMH, which corresponds to approximately 350-450 trials per function (each
staircase was terminated after 12 reversals). At least two, but typically more,
staircases were employed for RM and MSB. In each session, at least four
interleaved staircases were run: two base slants (one positive and one negative
to avoid adaptation) with two staircases each. Viewing distance was fixed in
each session. Procedure: Two-cue conditions
The procedure in the two-cue conditions was the same as
in the single-cue conditions except that a no-conflict stimulus (disparity- and
texture-specified slants equal to one another) and a conflict stimulus
(disparity and texture slants not necessarily equal) were presented on each
trial. Figure 3 depicts the disparity- and
texture-defined slants of the no-conflict and conflict stimuli. In both panels,
the slant specified by the disparity cue is plotted on the abscissa and the
slant specified by the texture cue on the ordinate. The conflict stimulus had
one cue set to a base slant ( =
±60,
±30, or 0 deg) and the other cue
was perturbed. The left panel depicts the conflict stimuli when disparity was
perturbed and the right panel shows the stimuli when texture was perturbed. The
perturbed cue had incremental slants of
±10,
±5, or 0 deg relative to the
unperturbed cue, so the conflict was always small. In previous work with quite
similar stimuli, a difference of 10 deg between the disparity- and
texture-specified slants was generally not detectable (Hillis, Ernst, Banks,
& Landy, 2002). The five possible
perturbed cue values are represented along the abscissa and ordinate in the left
and right panels, respectively. On each trial, a conflict stimulus and a
no-conflict stimulus were presented and the observer indicated the one
containing the apparently greater slant. No feedback was given. The value of the
no-conflict stimulus was varied according to staircase procedures to map out
the psychometric function. At least four staircases were run per experimental
session: two conflict conditions for each of two base slants.
Figure 3. Depiction of the stimulus values
and their manipulation in the two-cue experiment. Both graphs plot the
disparity-specified slant
(  ) on the
abscissa and the texture-specified slant
(  ) on the
ordinate for the conflict and no-conflict stimuli. The base slant is represented
by the origins (  ). The
conflict stimulus either had the disparity-specified slant perturbed from the
base slant by Δ (depicted in the left panel) or the texture-specified
slant perturbed by Δ (right panel). Five different conflicts were
presented for each base slant and those are represented by the blue circles in
the left panel and gray diamonds in the right panel. In the no-conflict
stimulus, the disparity- and texture-specified slants were equal to one another.
The staircase procedure varied the increments added to the base
slant:  .
Figure 3 also shows
how we determined the point of subjective
equality (PSE), the value of the no-conflict stimulus with the same
average perceived slant as the conflict stimulus. The enlarged bold
symbols—the dark-blue circle in the left panel and black diamond in the
right—represent two particular conflict stimuli. Δ represents the
incremental slant of the perturbed cue in the conflict stimulus, and
δ represents
the increment given to the no-conflict stimulus as the staircase procedure
varies its slant. As
δ is increased
and thereby the slant of the no-conflict stimulus is increased (represented in
the figure by displacement up along the main diagonal
where  ), the observer
will be increasingly likely to report that it had greater slant than the
conflict stimulus. At some value of
δ, the
no-conflict stimulus will on average have the same apparent slant as the
conflict stimulus; this is the PSE. If the cue weights are constant across small
variations in slant, we can determine the weights from this value of
δ.
Consider first the conflict stimulus. From Equations 1- 2 and
the fact that disparity-defined stimulus slant  (where
 is the base
slant), the expected value of the estimated slant of the conflict
stimulus
is  | (7) |
Now consider the no-conflict stimulus. From Equations 1- 2 and
the fact that  , the expected
value of the estimated slant of the no-conflict stimulus
is  | (8) |
The conflict and no-conflict stimuli will have
the same perceived slants when  and from Equations
7 and 8, we
have  | (9) |
Thus, the two-cue experiment yields an estimate of the
PSE from which we can determine the weights given to disparity and texture. The
assumption that the weights are constant for even small variations is
inconsistent with statistically optimal slant estimation, in which the weights
vary as a function of slant. However, given the precision of our measurements
and the rate of change of cue reliability, the fixed local weight assumption
provides a reasonable approximation.
We can also plot the percentage of judgments for which
the no-conflict stimulus appeared to have greater slant as a function of its
slant. The slopes of such psychometric functions index the discriminability of
the stimuli (discussed below in Results:
Just-noticeable
differences).
Specifying the predictions
To quantify the predictions of the MLE model, we need
estimates of the variances of the single-cue estimators
(  and  , or equivalently, the reliabilities  and
 in Equations 2 and 6). To estimate these variances, we fit the
psychometric data with a cumulative Gaussian using a maximum-likelihood
criterion. The standard deviations of the resulting functions were divided by
 (because the
psychophysical procedure was 2IFC) to yield estimates of the standard deviations
of the underlying slant estimators (Green & Swets, 1974). We call these just-noticeable
differences (JNDs) because they represent the slant difference that is correctly
discriminated ~76% of the time.
Figure 4 shows the JND
estimates for JMH and ACD (JNDs for RM and MSB, whose performance was tested at
only one distance, were similar to the those shown here and are plotted in Figure 1S). Each
row of panels represents data from one observer. The left column shows the
texture-alone data: JNDs in units of slant are plotted as a function of the
absolute value of base slant (there was no apparent difference in the results
for positive and negative slants). Different symbols represent data from
different viewing distances. As expected, texture JNDs did not vary
systematically with distance. Also as expected (Knill, 1998a), texture JNDs decreased as the absolute
value of slant increased.
Figure 4. Just-noticeable slant
differences (JNDs) for the single-cue experiment. JNDs are plotted as a function
of slant or horizontal disparity. Different symbol colors represent data for
different viewing distances: 19.1, 57.3, and 171.9 cm. Left and middle columns:
JNDs for texture-alone and disparity-alone, respectively, as a function of the
absolute value of the slant. Error bars are 95% confidence intervals. Right
column: JNDs for disparity alone plotted in terms of
HSR. The ordinate
is the difference between the absolute values of the natural log of
HSR for the base
slant and the just-noticeably different slant. The abscissa is the absolute
value of the natural log of
HSR
of the base slant. Error bars are 95% confidence intervals. The lines in
the left column (texture-alone) and right column (disparity-alone) are
maximum-likelihood curve fits to the data
(  ). The
break in the curve and the upward pointing arrow in JMH's disparity fit
indicates that JNDs go to infinity somewhere between
ln(HSR)
of 0.58 and 0.97 (slants of 60 and 70 deg at 19.1 cm).
The middle column of Figure 4 shows the data for the disparity-alone
condition: Slant JNDs are plotted as a function of the absolute value of base
slant (there were no systematic differences for positive and negative slants).
As expected from the viewing geometry (Equations
1 and 2 in Backus et al., 1999), disparity slant JNDs increased
systematically with an increase in viewing distance. JNDs also tended to
decrease with base slant at the medium and far viewing distances (see also Knill
& Saunders, 2003). At the near
viewing distance, JNDs tended to increase with base slant. In fact, as indicated
by the symbol with the yellow star, JMH's thresholds were infinite at base
slants of
±70 °
and a viewing distance of 19.1 cm because the binocular images could not be
fused in this condition. This difference in the trend between near and far
distances can be understood in terms of the retinal signal to slant.
The right column of Figure
4 plots the same data as the middle column but in units of relative
disparity: specifically, the horizontal-size ratio
( HSR)
(Backus et al., 1999).  , where  and  are the horizontal angles subtended by a surface
patch in the left and right eyes. Plotted in these units, JNDs do not vary systematically
as a function of viewing distance. This implies that the increase in
slant-discrimination threshold is caused only by the geometric relationship
between distance and disparity and not by greater error in the calculation of
disparity nor by greater error in estimates used to scale for distance (such as
vergence; Equation 2 in Backus et al., 1999). JNDs plotted in these units increase
with increasing  . This increase may reflect difficulties in
solving the binocular-matching problem as the disparity gradient (which is
linearly related to
HSR)
increases (Banks, Gepshtein, & Landy, 2004; Burt & Julesz, 1980). The increase may also reflect the
fact that surfaces with large  contain fewer points near the Vieth-Müller
Circle where stereoacuity is highest.  increases more rapidly as a function of slant at
near distances (indicated by the fact that JMH's data at high base
HSRs
all come from the near viewing distance). For example, a change in slant from 60
to 70 deg results in a change in  from 0.58 to 0.97 at 19.1 cm and from 0.06
to 0.1 at 171.9 cm. (We did not plot the point at 70 deg,
 = 0.97, because
thresholds were infinite.) We will return to a discussion of the effects of
distance and base slant in Discussion: Comparison
of observed and expected effects of slant and distance on disparity- and
texture-based JNDs.
To make predictions for the two-cue conditions, we
needed estimates of the variances of the disparity and texture estimators at
slants between the ones for which we have measurements. For the interpolation,
we fit smooth curves to the data (  , where
x is slant and JND is in deg [texture], or x
is  and JND is  [disparity],
and α and
β are free
parameters). This was done by performing a maximum-likelihood fit to all of the
raw psychometric data for a given condition (texture or disparity), varying
α and
β. The
curves and the data are shown in the left and right columns of Figure 4. The curve fits represent a fit to the
data at all three viewing distances. Thus, they give us a way to estimate
disparity and texture reliability between slants where we have measurements, and
they also allow us to interpolate across distance. While the reliability of the
disparity cue to slant,
HSR,
does not vary systematically with distance, the relationship between
HSR
and slant varies significantly with viewing distance. Figure 5 shows how the reliability of disparity
slant estimates varies with slant and distance, based on the curve fits to
JMH's data. The reliability of the disparity cue to slant decreases as
distance increases and the reliability of the disparity cue varies with base
slant in different ways at different distances. At near distances the disparity
cue is more reliable than the texture cue (and hence, should be given more
weight according to the MLE
model).
Figure 5. JNDs of texture (orange) and
disparity (blue) cues across distance and slant estimated from curve fits to
JMH's single-cue data ( Figure 4).
Given a pair of JND values for texture and disparity,
we can use Equation 2 to calculate optimal
weights. Predicted weights determined solely from the standard deviations of
cumulative Gaussians fitted to single-cue psychometric data are shown in Figure 6 as data points (based on the raw JND
data) and curves (based on the fitted curves in Figure 4). (Similar plots for the other two
observers are shown in Figure 2S. The
filled circles and blue curves are the predicted disparity weights and the
unfilled diamonds and gray curves are the predicted texture weights. Because
JMH's thresholds were infinite at 70 deg in the disparity-alone condition
at 19.1 cm, the MLE weight given to disparity in this condition is 0. The
curve used to fit JMH's disparity data in Figure 4 does not capture this fact. To
incorporate this fact, we smoothly extrapolated the predicted weights curve so
that the disparity weight reached 0 at 70
deg.
Figure 6. Predicted weights for disparity
and texture cues. From left to right, the panels show data from viewing
distances of 19.1, 57.3, and 171.9 cm. The weights were calculated using Equation 2 and the single-cue discrimination
data and curve fits shown in Figure 4. Unfilled
diamonds are predicted weights for the texture cue and filled circles are
predicted weights for the disparity cue. The solid lines are predictions
calculated from the curve fits in Figure 4.
Error bars are 95% confidence intervals.
The predicted weights exhibit two trends. First, with
increasing slant, the texture weight increases and the disparity weight
decreases (data from RM and MSB showed the same trend; Figure 2S) The
reciprocal relationship between the texture and disparity weights occurs because
the weights are constrained to sum to 1. The texture weight becomes relatively
greater than the disparity weight with increasing slant because it becomes a
relatively more reliable estimate ( Figure 4).
Knill and Saunders ( 2003) observed a
similar effect. Second, with increasing distance, disparity weight decreases
(and texture weight increases). Although the reliability of the texture
estimator does not change with distance, its relative reliability increases
because the reliability of the disparity estimator decreases. Individual
differences in disparity and texture estimators (i.e., the single-cue data, Figure 4) are manifest in their predicted weights,
a point we will discuss
later. Points of subjective equality (PSEs)
From the two-cue data, we can derive the weights the
observers actually gave the disparity- and texture-specified slants. Figures 3 and 7
illustrate how this was done. The left panel of Figure 7 shows one observer's psychometric
data for a base slant of 0 deg and viewing distance of 57.3 cm. It plots the
proportion of trials on which the observer indicated that the no-conflict
stimulus appeared to have greater slant (right side farther away) than the
conflict stimulus. Psychometric data from four cue-conflict conditions are
shown. Unfilled diamonds represent data for which the disparity-specified slant
was 0 deg and the texture-specified slant was
-10 (gray) or
+10 deg (black). Filled circles
represent data when the texture slant was 0 deg and the disparity slant was
-10 (light blue) or
+10 deg (dark blue). It is readily
apparent that the texture and disparity cues both affected perceived slant
because perturbing the texture-specified slant affected judgments (shown by the
separation between the gray and black diamonds) and perturbing the
disparity-specified slant also affected judgments (the separation between the
light and dark blue circles). The effect of disparity perturbation was greater
than the effect of texture perturbation, so the weight given to disparity was
larger in this condition. PSEs, the no-conflict stimulus values that appeared on
average to have the same slant as the conflict stimuli, are indicated by the
arrows. The right panel of Figure 7
illustrates how those PSEs were used to determine the empirical weights. If the
perturbed cue (texture for the diamonds and disparity for the circles) were the
sole determinant of perceived slant (meaning that its weight equaled 1; Equation 1), the PSEs would lie along the diagonal
line. If the non-perturbed cue were the sole determinant, the PSEs would fall on
the horizontal line. The relative location of the PSE data between these two
extremes reflects the weight given to the perturbed cue ( Equation 9). In the same format as Figure 7, Figures
8- 10 compare PSE data from the two-cue
conditions (reflecting the weights observers actually gave to the two cues) with
MLE predictions based on the single-cue data ( Equation 1).
Figure 7. Determination of points of
subjective equality (PSEs) from two-cue data. Left panel: one observer's
results for four cue-conflict stimuli with
 = 0 deg,
Δ = +/-10 deg, and distance = 57.3 cm. The conflict stimuli are:
 = 0,
 = -10 deg
(unfilled gray diamonds),
 = 0,
 = 10 deg
(unfilled black diamonds),
 = -10,
 = 0 deg
(filled light-blue circles),
and  > = 10,
 = 0 deg
(filled dark-blue circles). Data represent the proportion of times the observer
indicated the no-conflict stimulus was more slanted than the conflict stimulus.
Staircase data with fewer than four observations at a given value of the
no-conflict stimulus have been removed for clarity. Curves are
maximum-likelihood fits of cumulative Gaussians (which used all the points
including the ones removed for the clarity). The means of the fits are PSEs, the
value of the no-conflict stimulus that on average had the same apparent slant as
the conflict stimulus. The PSEs for each of the four conflict stimuli are
indicated by the arrows. Right panel: PSEs for the four psychometric functions.
Values of the no-conflict stimulus (indicated by arrows in left panel) are
plotted as a function of the conflict Δ. If perceived slant were
determined by one cue only (meaning its weight = 1), the data would lie on the
diagonal line labeled “Perturbed cue dominant” when that cue was
perturbed and on the horizontal line labeled “Non-perturbed cue
dominant” when the other cue was perturbed. Error bars are 95% confidence
intervals.
Figure 8 shows the
data from observer JMH and Figure 9 the
data from ACD. The columns of panels show data, from left to right, for viewing
distances of 19.1, 57.3, and 171.9 cm. The rows of panels show data, from top to
bottom, for base slants of +60,
+30, 0,
-30, and
-60 deg (indicated by orange numbers).
The abscissa in each panel is the value of the perturbed cue's slant in
the conflict stimulus and the ordinate is the PSE. Figure 10 shows data for RM and MSB at the 57.3-cm
viewing distance. Here the columns of panels correspond to different observers
and the rows are the same as in Figures 7 and
8.
Figure 8. PSE data and predictions for
observer JMH. PSE (slant of the no-conflict stimulus perceived on average as the
same as conflict
stimulus;  ) is
plotted as a function of the value of the perturbed cue in the conflict stimulus
(  ). The
left, middle, and right columns are data from viewing distances of 19.1, 57.3,
and 171.9 cm. The rows are for base slants
(  ) of
–60, –30, 0, 30, and 60 deg. Those base slants are the middle
abscissa value in each panel. Blue filled circles are PSEs when the disparity
cue was perturbed and black unfilled diamonds are PSEs when the texture cue was
perturbed. Blue and gray lines are the predictions based on Equations 1- 2 and
the curve fits in Figure 4. Error bars are 95%
confidence intervals.
Figure 9. PSE data and predictions for
observer ACD. Conventions the same as Figure
8.
Figure 10. PSE data and predictions for RM
and MSB at the 57.3-cm viewing distance. Symbol conventions the same as in Figures 8 and 9.
The blue and gray lines are MLE predictions for the
disparity-perturbed and texture-perturbed conditions, respectively. For each
conflict stimulus, the reliability for each cue was computed based on the fitted
curves in Figure 4. The optimal weights were
then computed using Equation 2. These weights,
together with the displayed slants for each cue, were combined using Equation 1 to predict the PSE (i.e., the perceived
slant for the conflict stimulus). The predictions are curved because the
relative reliabilities (and hence the cue weights) change as the perturbation is
changed (Hillis et al., 2002). We used a
shortcut to generate the prediction curves. Specifically, we used the
reliability based on the displayed slant to calculate the weight, rather than
the reliability based on the observer's estimate of slant from each cue
(which varies from trial to trial). Predictions based on a full Monte Carlo
simulation in which weights were calculated separately for each simulated trial
were, however, indistinguishable from
these.
The agreement between the PSE data and predictions is
generally excellent. The two main expected trends are observed in the data: The
influence of disparity decreases with increasing distance and with increasing
slant. We will discuss exceptions to the close agreement in Discussion: Summary of the results. We also
plotted the MLE-predicted and actual weights in a similar format to Figures 8- 10.
These plots are shown in Figures 3S-5S.
These plots show that the weights are generally close to the MLE-predicted
weights and that the sums of the weights given to texture and disparity do not
differ from
one. Just-noticeable differences (JNDs)
The estimation model for cues with uncorrelated noises
( Equations 1- 2) produces the least-variable estimate of slant
given the available cues. If observers employ this cue-combination scheme, we
should see improvements in JNDs when both cues are available compared to when
only one cue is available. Equation 6 specifies
the variance of the optimal cue-combined estimator, which is lower than either
of the single-cue estimators. We used the estimates of JNDs from the single-cue
conditions ( Figures 4 and 1S) and Equation 6 to
calculate the predicted JNDs when both cues were available. Figure 11 shows measured and predicted JNDs
for JMH and ACD as a function of base slant for the three distances. The pale
symbols represent the single-cue JNDs: diamonds for texture alone and circles
for disparity alone. The filled red squares are the observed two-cue JNDs and
the shaded red areas contain the 95% confidence intervals for the predictions.
With few exceptions (discussed in Summary of results), the
two-cue data follow the predictions very closely. Importantly, two-cue JNDs are
consistently lower than single-cue JNDs, which shows that the visual system does
benefit from having both cues available. Similar JND plots for RM and MSB are
shown in Figure
6S.
Earlier we mentioned a test of the assumption that the
reliability of the disparity estimator measured in the single-cue experiment
with random-dot stimuli is a valid estimate of the estimator's reliability
in the two-cue experiment with Voronoi stimuli. We tested the assumption by
examining situations in the two-cue experiment in which the texture weight was
nearly zero. The texture weight was less than 0.15 in three situations, all with
observer JMH: distance = 19.1
cm and base slants of –15,
0, and +15 deg. His two-cue thresholds
in those situations were 2.9, 2.4, and 2.9 deg, respectively ( Figure 11). His single-cue, disparity-alone
thresholds in the same situations were 3.2, 2.6, and 2.1 deg, respectively ( Figure 4). The close correspondence supports our
assumption that the disparity-alone thresholds provided an estimate of the
appropriate reliability for the two-cue experiment.
Figure 11. Predicted and observed JNDs.
The just-noticeable difference in slant (JND) is plotted as a function of base
slant  . JNDs are
the sigma parameters for the cumulative normal fits to the psychometric data
divided by  and
represent our estimates of the standard deviation of the slant estimators.
Filled red squares are observed JNDs when texture and disparity were both
present. Faint gray diamonds are observed JNDs for texture alone ( Figure 4, left) and faint blue circles are
observed JNDs for disparity alone ( Figure 4,
middle). Disparity JNDs for ±70 deg base slant at 19.1 cm for JMH were
infinite (indicated by pale blue symbols with yellow stars). Error bars
represent 95% confidence intervals. Red curves represent 95% confidence
intervals for the predicted JNDs ( Equation 6).
Left, middle, and right panels represent the data from viewing distances of
19.1, 57.3, and 171.9 cm.
By similar reasoning, we can test the assumption that
the reliability of the texture estimator measured in the single-cue experiment
with monocular stimuli is a valid estimate of the estimator's reliability
in the two-cue experiment with binocular stimuli. To generate one slant estimate
from the texture-specified slants at the two eyes, the visual system should
combine the monocular signals in some fashion. The combination could occur in
two ways. (1) The visual system might combine the two eyes' images before
computing slant. This could be done in principle by averaging the visual
directions for each corresponding point in the two images. Then slant would be
computed from the combined Cyclopean image. (2) The visual system might estimate
eye-centered slants before combining. Specifically, it could estimate the slants
from the texture signals received by each eye and then average the two
estimates. These two means of combining the monocular images are geometrically
equivalent and yield the same slant as would be observed at the Cyclopean eye as
long as the coordinate origin is on the Vieth-Müller
Circle. At any rate, averaging the two
eyes' inputs is a reasonable way to form a texture-based slant estimate.
If we assume that the two monocular inputs are equally informative and that
their noises are uncorrelated (perhaps an implausible assumption), the variance
of the combined estimate would be half the variance of either monocular
estimate. In other words, discrimination thresholds based on the texture
information alone would be lower in the binocular than in the monocular case by
 (Legge, 1984). We tested this possibility by examining
situations in the two-cue experiment in which the disparity weight was nearly
zero. This occurred for JMH and ACD across all slants at 171.9 cm. It also
occurred for observer JMH at 19.1 cm and base slant = ±70 deg.
JMH's texture-alone JNDs at 171.9 cm for base slants of –45 to
+45 deg (the range of tested slants)
were 2.3–8.0 deg (the lowest values occurring at the greatest slants; Figure 4). His two-cue JNDs at 171.9 cm for base
slants of –45 to
+45 deg ranged from 3.4–5.9 deg
(again the lowest values occurring at the greatest slants; Figure 11). JMH's texture-alone JNDs at 19.1
cm for base slants of –70 and
+70 deg were 1.5 and 1.0 deg,
respectively, and his corresponding two-cue JNDs were 1.4 and 1.4 deg.
ACD's texture-alone JNDs at 171.9 cm ranged from 2.4–5.3 deg and her
two-cue JNDs ranged from 3.3–4.1 deg. Thus, when the disparity weight was
low, the texture-alone thresholds were generally similar to the corresponding
two-cue thresholds. The good correspondence supports our assumption that the
texture-alone thresholds provided an estimate of the appropriate reliability for
the two-cue experiment. It also implies that the slant specified by texture is
not made more reliable by averaging the two eyes' images, perhaps because
the noises are highly
correlated.
The generally excellent agreement between observed and
predicted PSEs and JNDs indicates that humans use a statistically optimal
strategy for combining slant information from disparity and texture. There are,
however, three cases in which the data deviated from the predictions.
(1) JMH's PSEs in the two-cue condition at 19.1
cm and base slants of –60 and
+60 deg ( Figure 8). The weight given disparity was lower
than predicted when the absolute value of the perturbed-cue slant was greater
than 60 deg. In the disparity-alone ±70-deg, 19.1-cm conditions, JMH
could not fuse the random dot stimulus (thus, thresholds were infinite). The
same was true in the two-cue condition: Slant judgments were made on diplopic
images, making the task more complicated. Our model does not consider how depth
judgments are made in diplopic conditions. Given this, the discrepancy between
observed and two-cue data is
understandable.
(2) ACD's JNDs in the two-cue condition for all
base slants at 19.1 cm and for the larger base slants at 57.3 cm ( Figure 11). Her two-cue thresholds were
consistently lower than predicted. Moreover, ACD gave slightly more weight to
disparity than predicted for base slants of
±30 and
±60 deg at 57.3 cm ( Figure 9). The most obvious explanation for these
discrepancies is that the disparity-alone JNDs ( Figure 4) overestimated the variance of
ACD's disparity estimator in the two-cue experiment. As described in Methods ( Figure
2), ACD may have given some weight to the uninformative texture signal in
the disparity-alone experiment for nonzero base slants. This would have caused
an overestimate of the variance of the disparity estimator whenever the
disparity weight was relatively high in the two-cue experiment (which occurs
when the viewing distance is 19.1 or 57.3 cm) and whenever the base slant
differed significantly from zero.
(3) JMH's and ACD's disparity weights were
higher than predicted at 171.9 cm when the base slant was 0 deg. We think this
small discrepancy is caused by variation in binocular fusion at long distances.
Both observers reported difficulty fusing the random-dot stimulus in the
single-cue experiment when the viewing distance was 171.9 cm (perhaps because of
the conflict between vergence and accommodation). Thus, their thresholds at
171.9 cm may have slightly overestimated the variance of the disparity estimator
at that distance. (ACD also had difficulty fusing the random-dot stimulus at
19.1 cm, which may have contributed to the apparent overestimate of the variance
of the disparity estimator as discussed under #2 above.) Both observers found it
easier to fuse the Voronoi stimulus at 171.9 cm, presumably because that
stimulus provides contours to guide vergence eye movements. The discrepancy is
most likely to show up when the base slant is 0 deg because the disparity weight
is highest in that case. Thus, this discrepancy between predicted and observed
behavior is probably caused by fusion difficulties in the single-cue experiment
at the long distance.
The great majority of the data is consistent with the
MLE predictions and strongly supports the hypothesis that observers combine the
slant cues of disparity and texture in a statistically optimal
fashion. Comparison to other studies
Five studies have examined quantitatively whether cue
combination is statistically optimal (Alais & Burr, 2004; Ernst & Banks, 2002; Gepshtein & Banks, 2003; Knill & Saunders, 2003; Landy & Kojima, 2001). In agreement with our results, all
five found that combination of cues from different sensory modalities (haptics
and vision: Ernst & Banks, 2002;
Gepshtein & Banks, 2003;
audition and vision: Alais & Burr, 2004) or different visual cues (Knill &
Saunders, 2003; Landy & Kojima,
2001) was quite close to MLE
predictions.
Knill and Saunders ( 2003) tested the MLE model for combining
texture and disparity cues to surface slant. Their stimuli were slanted about a
horizontal axis (tilt = 90 deg). Like
us, they took advantage of the fact that the relative reliabilities of texture
and disparity vary naturally with viewing geometry. They reported reasonable
agreement between observed and predicted behavior. We extended their
investigation by examining texture and disparity combination for surfaces
slanted about a vertical axis (tilt = 0 deg) at various distances. Our data are similar and dissimilar to Knill
and Saunders'. Our texture-alone data exhibited a smaller effect of base
slant on JNDs (compare our Figure 4 to their Figure 6). Average texture-alone JNDs in our study
were ~8 and
~1.5 deg at 0 and 70 deg, respectively
(a ratio of 5.3). The corresponding JNDs in Knill and Saunders were
~40 and
~2 deg (a ratio of 20). The fact that
our JNDs were generally lower is undoubtedly because our Voronoi patterns were
more regular than Knill and Saunders'. The differing effect of base slant
is most likely due to differences in how the angular subtense of the stimuli
varied with slant: Ours varied with base slant and theirs was constant. To hold
angular size constant, Knill and Saunders added texture elements as slant
increased, and this adds progressively more information as slant
increases.
We also examined how viewing distance affects the
weights assigned to texture and disparity and found that the weight assignment
is essentially optimal. This result seems to contradict numerous reports of
failures to scale veridically for distance in stereoscopic tasks. For example,
Johnston ( 1991), Johnston, Cumming, and
Parker ( 1993), and Bradshaw,
Glennerster, and Rogers ( 1996) had
observers judge the amount of depth in disparity-defined cylinders, spheres, and
ridges when presented at different distances. Responses were far from veridical,
indicating that depth was overestimated at near and underestimated at far
distances. How could we observe optimal weight changes as a function of
distance, while previous work showed apparent failures to take distance into
account? We think the answer lies in the influence of unmodeled cues. In all but
one of the previous experiments (Experiment 1 in Johnston et al., 1993), the texture gradient specified a
frontoparallel plane. From our analysis, one would expect observers to report
seeing less depth at long distances, not because they failed to take distance
into account, but rather because they gave increasing weight to a signal
specifying that the stimulus is flat. This claim is supported by the observation
that making the texture gradient consistent with the disparity-specified shape
generally makes judgments more veridical (Buckley & Frisby, 1993; Johnston et al., 1993). Furthermore, when the task is to
adjust the shape of a surface until it appears planar and thereby consistent
with the texture-specified shape, observers seem to take distance into account
veridically (Rogers & Bradshaw, 1995).
If distance was not taken into account in scaling
disparities, it is possible that this mis-scaling could be mistaken for a change
in disparity weight with change in distance. This no-scaling hypothesis is
considered and rejected in Appendix
C. Dynamic determination of cue weights
MLE cue combination has the advantage that it produces
the least-variable estimate of slant given the available cues. But it requires
the observer to choose weights based on the reliability of the cues. In the case
of texture, the reliability clearly depends on the slant, which is what the
observer is trying to estimate. Thus, the choice of weights must be made
dynamically, with the possibility of varying weights from trial to trial (or
from location to location within a stimulus, discussed shortly). The model
suggests that on each trial the observer makes an estimate of slant from each
cue, uses the value of slant for each cue along with other relevant information
(“ancillary cues” such as a distance estimate; Landy, Maloney,
Johnston, & Young, 1995) to determine
that cue's current reliability. The relative reliabilities are then used
to determine the cue weights ( Equation 2),
followed by weighted cue combination ( Equation
1). In our experiments, the slant shown to the observer was selected
randomly before each trial from the set of possible slants within each block.
For performance to approach optimality, the weights must have been determined in
a trial-by-trial dynamic fashion. In a previous study we also had clear-cut
evidence of weights changing from trial to trial (Hillis et al., 2002). The reader may wonder how such
dynamic computation could be accomplished in a biological system without prior
knowledge of the likelihood functions associated with each slant cue. Ernst and
Banks ( 2002) outlined a plausible neural
model that could carry out the computation
automatically. Comparison of observed and expected effects of slant and distance on disparity- and texture-based JNDs
We observed three effects in the single-cue
experiments—a large improvement in discrimination threshold with
decreasing distance with disparity alone, a small improvement in threshold with
increasing slant with disparity alone (see also Knill & Saunders, 2003), and a large improvement in
discrimination threshold with increasing slant with texture alone (Knill, 1998b; Knill & Saunders, 2003). Here, we ask whether the three
observed effects are expected from the slant information in the stimulus.
When the eyes are in forward gaze, as they were in
these experiments, the vergence
is , | (10) |
where
d
is viewing distance and
i
is the inter-ocular distance. Slant from disparity (for tilt = 0) is given to close
approximation
by  | (11) |
Thus, errors in the disparity and distance
estimates will both yield errors in the estimated slant. We calculated the
distribution of slant estimates for different viewing conditions under the
assumption that the errors in
HSR
and
μ can
be represented by additive, independent noises. Specifically, we conducted a
Monte Carlo simulation to determine the standard deviation of slant estimates
 from Equation 11. The noises were Gaussian with mean = 0. We adjusted the noise
standard deviations,  and  , to obtain simulation JNDs similar to the
observed JNDs. The simulation results are displayed in Figure 12. The left panel shows
 as a function
of distance (the curves representing different base slants) and the right panel
shows  as a function
of base slant (the curves representing different
distances).
Figure 12. Results of a simulation of
slant from disparity estimation. We used a Monte Carlo simulation to calculate
the standard deviation of disparity-based slant estimates ( Equation 11) for different viewing conditions. We
assumed that error in the slant estimates stemmed from noise in
HSR and
μ (vergence angle) and that
these errors (the variances) were the same for all viewing distances. The noises
were additive and Gaussian with mean = 0, and we obtained simulation results for
many sets of parameters. The results for
 = 0.012
and  = 0.012
radians, which fit the data reasonably well, are displayed in the figure. Left
panel: the standard deviations of slant estimates are plotted as a function of
distance. Different curves represent different absolute values of base slant.
The circles represent the observed JNDs for observer JMH at the various
distances. Different colors represent different absolute values of base slant.
Right panel: the standard deviations of slant estimates as a function of base
slant. Different curves represent different viewing distances. The circles
represent the observed JNDs for observer JMH. Different colors represent
different distances.
The standard deviation of the slant
estimate,  , is roughly
proportional to viewing distance for all base slants (left panel). This result
is expected from Equation 11
because  , so fixed
additive noise in
μ has an
increasing effect with distance. We found that
 was
proportional to distance for a wide range of 
and  ; the key assumption is that the noise in
disparity normalization is fixed and additive in vergence. The data points in
the lower left panel are JNDs from observer JMH; clearly, his discrimination
thresholds increased monotonically with increasing distance in much the same way
as the simulation. The data from ACD were similar. Thus, the distance effect we
observed in the disparity-alone experiment is expected if error in disparity
normalization is additive in units of vergence.
The right panel of Figure
12 shows that  is inversely
related to the absolute value of slant. This relationship was observed for all
values except when  . The
relationship is expected from Equation 11
because  , so fixed
additive noise in HSR
has progressively less effect on  as base slant increases. The data points in the
lower right panel are JNDs from observer JMH; data were similar for the other
three observers. At viewing distances of 57.3 and 171.9 cm, JMH's
discrimination thresholds decreased monotonically with slant magnitude much like
the simulation's standard deviations.
Thus, the base-slant effect we observed in the
disparity-alone condition is expected if error in disparity measurement is
additive in HSR.
Does this assumption make sense? It does when
HSR
is not significantly different from 1, which was true for distances of 57.3 and
171.9 cm (see Figure 4). However, when
HSR
is quite different from 1, points on the surface fall where stereo-acuity is low
and problems arise in solving the binocular correspondence problem (Burt &
Julesz, 1980).
HSR
and the horizontal gradient of horizontal disparity are closely
related, , | (12) |
where
DG
is an approximation to the disparity gradient (Howard & Rogers, 2002). From Equations 10 and 11 when
d
is small and
S
is large,
HSR
is quite different from 1 and thus
DG
will be quite different from 0. Burt and Julesz ( 1980) and others have shown that binocular
correspondence becomes difficult when  deviates significantly from 0 and breaks down
altogether when  . Recent results
indicate that this is probably a by-product of a matching process that is
similar to cross-correlating the two eyes' images to estimate the
disparity in a region of the visual field (Banks et al., 2004). Figure 13 plots the disparity gradient as a
function of slant for the three distances we used.  increases
rapidly as a function of slant at the short distance, so we expect performance
to be worse at that distance for large slants. JMH's data exhibited this
effect. His discrimination thresholds at 19.1 cm increased with slant, which is
inconsistent with the assumption that the sole source of error in disparity
measurement is additive in
HSR
( Figure 12, lower-right panel, gray
curve). They were higher than predicted for
 30 deg which
corresponds to a higher disparity gradient
(  0.19,
HSR  1.2) than occurs at 57.3 and 171.9 cm. Thus, the
base-slant effect in the disparity-alone experiment is expected if error in
disparity measurement is additive in
HSR
except when
HSR
deviates significantly from 1 where problems arise in solving
correspondence.
Figure 13. Disparity gradient as a
function of slant at different viewing distances. The disparity gradient was
calculated from Equation 12.
We also compared our observed texture-alone thresholds
with those expected from the information in the various slant cues associated
with the texture gradient. Knill ( 1998a,
1998b)
described ideal observers for slant from texture when presented Voronoi stimuli
like the stimuli in our experiments. The stimulus parameters in our experiment
differed from those in his modeling and experiments in two ways.
First, the Voronoi patterns in our stimuli were more
regular than in his. From this one would expect the texture-gradient cue to be
more reliable in our experiment than in Knill's.
Second, the angular subtense of our stimuli varied with
slant (even though there was a random element to the angular width so as to make
the width an unreliable cue to slant), so the average number of texture elements
was constant across slant. To keep the angular subtense constant, Knill added
texture elements as slant increased, which adds information. This added
information probably explains why his observed and predicted discrimination
thresholds varied more with slant than ours
did.
Despite the differences in stimulus parameters, it is
informative to compare ideal thresholds with our observers' thresholds.
The curve in Figure 14 shows the standard
deviation of slant estimates from Knill's foreshortening ideal observer
(Figure 5A in Knill, 1998a). There is a
striking effect of base slant. The data points are JMH's thresholds in the
texture-alone experiment; data were similar for the other three observers. The
data exhibit a base-slant effect like the ideal observer's, but the effect
is smaller in our data for reasons described above. Therefore, the variation we
observed in texture-based slant thresholds is by and large expected from the
information content of the stimulus.
Figure 14. Ideal and measured JNDs for
slant from texture as a function of base slant. The solid line represents the
standard deviations of the slant estimates of the foreshortening ideal observer
for Voronoi stimuli (Figure 5A in Knill, 1998a). The diamonds represent discrimination
thresholds in the texture-alone condition for observer JMH. Light gray diamonds
are thresholds at 19.1 cm, medium gray at 57.3 cm, and dark at 171.9
cm.
We conclude that the effects of distance and slant on
JNDs can be expected from the information present in the stimuli. These effects
are summarized in Figure
5. What other variables might affect cue weights?
Presumably, the visual system takes the disparity and
texture variances into account across many viewing situations. To do so,
however, is complex because many viewing properties will affect the likelihood
functions associated with disparity and texture cues. Here we list the most
obvious properties and suggest how the relative weights assigned to disparity
and texture ought to be
affected. 1. Regularity of texture.
The slant information contained in the texture gradient
can be divided into three cues: (1) scaling, the change in the projected sizes
of texture elements, (2) foreshortening, the change in projected shapes of
texture elements, and (3) density, the change in the number of elements per unit
area in the projection (Blake et al., 1993; Cutting & Millard, 1984; Knill, 1998a). The reliability of scaling as a slant
signal depends on the variation in the sizes of the texture elements on the
surface. With greater size variation, the cue's reliability decreases
(Knill, 1998b). The reliability of
foreshortening depends on the variation in the shapes of the elements on the
surface. For regular shapes, like circles, reliability is greater than for
irregular shapes, such as ellipses with variable aspect ratios (Knill, 1998b; Young et al., 1993). The reliability of the density cue
depends on the number of elements and the regularity of their positioning on the
surface. Presumably, many elements placed regularly (i.e., in a grid) yield more
reliable estimates than few elements placed randomly. All three cues are
affected by the field of view, particularly in the tilt direction, so slant
discrimination from texture is more precise with large than with small stimuli
(Blake et al., 1993; Knill, 1998b). If the visual system takes the varying
reliability of the texture gradient cue into account, all of these stimulus
properties will affect the relative weights assigned to disparity- and
texture-based
signals.
The direction of slant or tilt affects the amount of
perceived slant in stereograms (Howard & Rogers, 2002). The disparity signal for surfaces
slanted about a vertical axis (tilt = 0 deg) is the horizontal gradient of horizontal disparities. We have
quantified this as the horizontal-size ratio (HSR). The disparity signal for
surfaces slanted about a horizontal axis (tilt = 90 deg) is the vertical gradient of horizontal disparities. This
disparity pattern is often referred to as horizontal-shear disparity (Banks et
al., 2001). Random-element stereograms
simulating a slanted plane with tilt = 0 deg generally produce less perceived slant than planes with tilt = 90 deg (Gillam & Ryan, 1992). Similarly, the amount of depth seen
in curved disparity-defined surfaces varies with tilt (Buckley & Frisby, 1993). These tilt-dependent variations
in perceived depth are called slant anisotropy. The phenomenon is most striking
when the texture gradient specifies a frontoparallel plane, as is usually the
case with random-element stereograms. The phenomenon is not observed when
disparity and texture signal the same depth variation, as occurs with real
surfaces (Bradshaw, Hibbard, van der Willigen, Watt, & Simpson, 2002; Buckley & Frisby, 1993). These observations strongly
suggest that slant anisotropy is caused by conflicting disparity and texture
signals in conventional random-element stereograms. They also suggest that
texture is generally given more weight for tilt 0 (as in our experiments) and
less weight for tilt 90 (as in Knill & Saunders, 2003). By the argument presented here,
this may be due to reduced disparity reliability for tilt 0 than for tilt 90
because there is no obvious reason for the reliability of the monocular texture
cue to depend on tilt. There may, however, be differences in the steps required
to combine the texture and disparity signals for different tilts. The issues
involved in transforming texture-gradient signals into the same coordinates for
combination with disparity signals are taken up in Appendix D. 3. Reliability of estimated distance and azimuth.
To estimate slant from the measured disparities, the
visual system must “normalize” the disparities with a distance
estimate and “correct” the disparities with an azimuth estimate
(Gårding et al., 1995). Relaxing
the assumption of forward gaze in Equation 11,
slant about a vertical axis (tilt = 0) is  | (13) |
where
μ is vergence,
and γ is
azimuth (the angle between the head's median plane and the Cyclopean line
of sight) (Backus et al., 1999).
μ is estimated
both from extra-retinal signals concerning the eyes' vergence and from the
horizontal gradient of vertical disparity (Rogers & Bradshaw, 1995). When vertical disparities are
large, as occurs with large stimuli at close range, they are the predominant
means for estimating distance. However, when vertical disparities are unreliable
because the stimulus is small (Rogers & Bradshaw, 1995), or because the texture contains
no horizontal contours (Helmholtz, 1910),
the eyes' vergence becomes the predominant means of estimating distance
and the accuracy of disparity normalization drops (Rogers & Bradshaw, 1995). The
azimuth γ is
used to correct disparities; it is estimated from extra-retinal, eye-position
signals and from the magnitude of vertical disparities (Backus et al., 1999). When vertical disparities are large,
as occurs with near stimuli subtending a large angle, they are the predominant
means of estimating azimuth. When the stimulus is short or when vertical
disparities are unmeasurable, eye position becomes the predominant means and the
accuracy of disparity correction suffers ( Backus et al., 1999).
Similar arguments apply for slant estimation with
tilt = 90
deg. In this case, slant around a
surface point
is  | (14) |
where
μ is again the
vergence angle,
HSh
is horizontal shear disparity (Banks et al., 2001) and
τ is the
cyclovergence of the eyes (the difference in the eyes' torsion).
HSh
must be normalized for distance by an estimate of
μ and corrected
for cyclovergence by an estimate of
τ (Banks et
al., 2001; Howard & Kaneko, 1994). For
our present purposes, when the viewing situation reduces the reliability of the
estimates of the normalizing and/or correcting signals, the disparity estimate
will become more variable. This will occur, for example, when the stimulus
subtends a small angle, when the surface markings make the measurement of
vertical disparity unreliable, and when the stimulus is distant. If the visual
system takes such changes into account, the weight given to disparity should
decrease in those circumstances.
4.
Duration. Van Ee and Erkelens ( 1998)
showed that the slant perceived from disparity-defined planes increases with
stimulus duration. Their random-element stereograms contained the texture
gradient associated with a frontoparallel plane, so their results are consistent
with a model in which the weight given to disparity relative to texture
increases over time. Presumably, disparity and texture
estimates both become more precise with increases in
stimulus duration, but the increase may be slower for disparity. Thus, stimulus
duration may also affect the relative weights given to disparity- and
texture-based slant
estimates. Are cue weights computed locally?
It is interesting to consider whether the visual system
determines one set of weights for each surface or whether the weights are
calculated locally. That is, can the weights vary from one patch on a surface to
another? If they are calculated locally, there are situations in which a
cue-conflict stimulus specifying a plane should appear curved. Here we explain
why this should happen and report that the predicted curvature is in fact
observed.
The left panel of Figure
15 shows how slant and distance vary with azimuth when the surface is a
plane. For the part of the plane that lies straight ahead, the slant is
S and the distance is
d; for the
part on the right, it
is  | (15) |
where
γ is the azimuth. The distance
to the intersection of the line and plane
is  | (16) |
The left and middle panels of Figure 16 show how  and
 vary with
azimuth for different base slants and
d = 19.1 cm.
Because the local slant and distance vary with azimuth, the statistically
optimal weights for the texture and disparity cues should vary with
azimuth. Now consider the cue-conflict stimulus in
the middle panel of Figure 15. For rightward
gaze
( γ< 0), slants  and
 approach zero
and distance  decreases. Our
data ( Figure 6) show that texture weight is
relatively low when the absolute value of slant is
~0 and distance is short. Thus, if the
weights used in combining slant estimates are determined locally, one would
expect the texture weight in this situation to be lower on the right than
straight ahead. (The changes in local slant and distance with changes in azimuth
are unaffected by the direction in which the eyes are looking; they are
determined only by the positions of surface points relative to the head. Thus,
when we say “on the right” or “straight ahead,” we refer
to the head-centered azimuth of a line of sight from the Cyclopean eye and not
necessarily the azimuth of fixation.) For leftward azimuth
( γ >
0), the slants become increasingly negative and distance increases; the
texture weight in this situation should be higher on the left than straight
ahead.
Figure 15. Change in slant with azimuth.
Left panel: Definitions of local slant and d |