Volume 4, Number 12, Article 1, Pages 967-992 doi:10.1167/4.12.1 http://journalofvision.org/4/12/1/ ISSN 1534-7362
Slant from texture and disparity cues: Optimal cue combination
James M. Hillis
Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
[home] [e-mail]
Simon J. Watt
Department of Psychology, University of Wales, Bangor, Wales, UK
[home] [e-mail]
Michael S. Landy
Department of Psychology & Center for Neural Science, New York University, New York, NY, USA
[home] [e-mail]
Martin S. Banks
Vision Science Program, Department of Psychology, & Wills Neuroscience Institute, University of California, Berkeley, CA, USA
[home] [e-mail]
Abstract

How does the visual system combine information from different depth cues to estimate three-dimensional scene parameters? We tested a maximum-likelihood estimation (MLE) model of cue combination for perspective (texture) and binocular disparity cues to surface slant. By factoring the reliability of each cue into the combination process, MLE provides more reliable estimates of slant than would be available from either cue alone. We measured the reliability of each cue in isolation across a range of slants and distances using a slant-discrimination task. The reliability of the texture cue increases as |slant| increases and does not change with distance. The reliability of the disparity cue decreases as distance increases and varies with slant in a way that also depends on viewing distance. The trends in the single-cue data can be understood in terms of the information available in the retinal images and issues related to solving the binocular correspondence problem. To test the MLE model, we measured perceived slant of two-cue stimuli when disparity and texture were in conflict and the reliability of slant estimation when both cues were available. Results from the two-cue study indicate, consistent with the MLE model, that observers weight each cue according to its relative reliability: Disparity weight decreased as distance and |slant| increased. We also observed the expected improvement in slant estimation when both cues were available. With few discrepancies, our data indicate that observers combine cues in a statistically optimal fashion and thereby reduce the variance of slant estimates below that which could be achieved from either cue alone. These results are consistent with other studies that quantitatively examined the MLE model of cue combination. Thus, there is a growing empirical consensus that MLE provides a good quantitative account of cue combination and that sensory information is used in a manner that maximizes the precision of perceptual estimates.




History
Received January 20, 2004; published December 1, 2004
Citation
Hillis, J. M., Watt, S. J., Landy, M. S., & Banks, M. S. (2004). Slant from texture and disparity cues: Optimal cue combination. Journal of Vision, 4(12):1, 967-992, http://journalofvision.org/4/12/1/, doi:10.1167/4.12.1.
Keywords
depth perception, cue combination, stereopsis, Bayesian perception, texture gradient
for related articles by these authors

for papers that cite this paper


Introduction
The fundamental problem in depth perception is due to the geometry of perspective projection, which reduces the three-dimensional (3D) coordinates of the visual scene to the 2D coordinates of the retinal images. The third dimension of space has to be inferred from the 2D images. The visual system uses several sources of information—“depth cues” such as disparity, perspective, and motion parallax—to estimate the layout of the 3D scene. Estimates based on each individual cue are subject to error. By combining information from several depth cues, the visual system could estimate 3D layout with greater precision across a wider variety of viewing situations than it could by relying on any one cue alone. To realize this advantage, the reliability of each depth cue must be factored into the combination rule. Factoring in reliability is complicated because the reliability of individual depth cues depends on scene parameters in different ways. Are variations in depth cue reliability with scene geometry factored into the cue-combination rule? To examine this question, we compared human slant discrimination ability based on disparity and texture cues to a model of statistically optimal cue combination. Slant estimation from texture and disparity is an interesting case to examine because the reliabilities of disparity and texture cues vary in different ways with slant and viewing distance. Knill and Saunders (2003) examined the combination of texture and disparity as a function of slant with a similar approach to what we present here. We have expanded their experiments to include surfaces slanted about a vertical axis and surfaces at multiple viewing distances (see Discussion: Comparison to other studies). We measured the reliability of slant estimates from each cue in isolation across a range of slants and distances and used an optimal cue-combination rule to predict the appearance of two-cue stimuli and the precision of slant estimation with two-cue stimuli. We then compared these predictions to the results of two-cue slant discrimination experiments.
Optimal cue combination
Visual estimates of slant from any depth cue are subject to error. For example, perceived slant from a given texture gradient will vary from one instance to another due to the statistical nature of slant information from texture and errors in the measurement of the gradient (Blake, Bülthoff, & Sheinberg, 1993; Cutting & Millard, 1984; Knill, 1998a). When more than one depth cue is available and informative, one can in principle reduce the uncertainty associated with any one of the cues by combining across cues (for a review and derivation of the following results, see Oruç, Maloney, & Landy, 2003).
One approach to optimizing cue combination is statistical: What cue-combination rule results in an estimator that is unbiased and has minimum variance? Assume that the observer has unbiased estimates eq46.gif and eq47.gif of the slant of a surface based on disparity and texture cues, respectively. Assume further that errors in these estimates are uncorrelated and have variances eq48.gif and eq49.gif. If we combine the two estimates linearly, the rule that yields the minimum-variance, unbiased estimate is a weighted average that satisfies (Cochran, 1937)
eq50.gif(1)
where
eq51.gif(2)
and eq52.gif and eq53.gif are the reliabilities of the two cues (e.g., eq54.gif ). Furthermore, if errors associated with the individual estimators are Gaussian, no other (nonlinear) rule has lower variance.
An alternative approach is to apply Bayesian methods (for reviews, see Kersten, Mamassian, & Yuille, 2004; Mamassian, Landy, & Maloney, 2002). In the absence of any immediate consequences to an observer's actions (payoffs and penalties), the maximum a posteriori (MAP) estimate is typically employed. That is, the observer chooses a slant estimate eq55.gif that is most probable given the image data. We assume the image data can be segregated into those data eq56.gif used to estimate slant from disparity and eq57.gif used to estimate slant from texture. Thus, we choose the value of eq55.gif that maximizes eq59.gif. Applying Bayes' rule, and assuming that the two cues are conditionally independent, we derive
eq60.gif(3)
The first two terms on the right side of the equation are the likelihood functions for each cue characterizing the probability of observing the image data if eq55.gif is the actual slant. The last term is the prior distribution, which is the probability of observing eq55.gif in the scene, independent of the image data. If the likelihoods and prior are Gaussian, the MAP estimate has the same form as the minimum variance, linear combination estimate
eq63.gif(4)
where
eq64.gif(5)
Here, eq46.gif and eq47.gif are the maximum-likelihood estimates the observer would have made from each cue in isolation (the mean of the respective Gaussian distributions), and eq67.gif is the mean of the prior. The eq68.gif are the reliabilities of the respective distributions (likelihoods and prior). If the prior has large variance relative to the individual cue likelihoods, Equations 4 and 5 reduce to Equations 1 and 2, which also yields the most likely slant to have caused the current sensory data (i.e., it is the maximum-likelihood estimate or MLE). For our conditions, the variance of the individual cues is much smaller than the prior's variance (see Ideal observer models in Discussion), so we will use Equations 1 and 2 throughout.
By following the strategy described by Equation 2, the variance of the weighted average eq55.gif is
eq70.gif(6)
The variance of eq55.gif is lower than the variance of either single-cue estimate.
Many investigations of sensory cue combination have shown that cue reliability is taken into account in the estimation process (Backus & Banks, 1999; Banks, Hooge, & Backus, 2001; Battaglia, Jacobs, & Aslin, 2003; Buckley & Frisby, 1993; Frisby, Buckley, & Horsman, 1995; Jacobs, 1999; Körding & Wolpert, 2004; Rogers & Bradshaw, 1995; van Beers, Sittig, & Denier van der Gon, 1998; van Beers, Wolpert, & Haggard, 2002; Young, Landy, & Maloney, 1993). Only five studies, however, have tested the quantitative predictions of the MLE model expressed by Equations 1, 2, and 6, to determine if sensory cue combination is statistically optimal (Alais & Burr, 2004; Ernst & Banks, 2002; Gepshtein & Banks, 2003; Knill & Saunders, 2003; Landy & Kojima, 2001). These five studies measured reliabilities of individual cues (eq68.gif in Equations 2 and 6) and empirically tested predictions for both the appearance and discrimination thresholds for stimuli when both cues were present (provided by Equations 1, 2, and 6). All five reported that the combination is quite close to the one predicted by those equations. In these five experiments, the variances of estimates derived from single cues were measured by conducting two-interval, forced-choice (2IFC) discrimination experiments when only one cue was informative. For example, Ernst and Banks (2002) conducted size-discrimination experiments for vision alone and haptics alone and then fit cumulative Gaussians to the two psychometric functions. The variance parameter of the Gaussians provided estimates of the variances of the underlying visual and haptic estimators. Equations 1, 2, and 6 were then successfully used to predict the results of two-cue (visual-haptic) experiments.
Preview
The experiments presented here used the strategy of Ernst and Banks (2002) to ask whether texture and disparity cues to slant are combined in a statistically optimal fashion. The reliability of texture and disparity cues to slant vary with viewing geometry in different ways. First, the reliability of texture should increase with increasing slant because the image changes associated with a given change in slant increase (Blake et al., 1993; Knill, 1998a). This relationship between reliability and slant is reflected in human performance (Knill, 1998b). Theoretical and empirical analyses of the reliability of disparity as a function of slant have not been conducted, but it is unlikely that it changes significantly (Banks et al., 2001; Knill & Saunders, 2003). Second, because the magnitude of binocular disparities for a given depth difference decreases as viewing distance increases, the reliability of slant and curvature estimated from binocular disparity should decrease as viewing distance is increased. Experiments confirm that it does (Howard & Rogers, 2002; Ogle, 1950). On theoretical grounds, the reliability of texture-specified slant, for a fixed retinal-image density, should not change with distance. If a given textured surface is doubled in size and viewed from twice the distance, the retinal image is unchanged. Thus, optimal combination of disparity and texture cues to slant should involve complex changes in the weights given to the two cues depending on base slant and viewing distance.
We looked for evidence that the visual system weights the two cues appropriately across a range of slants and distances. Because the reliability of the texture cue increases with slant, we expect the texture weight to increase as slant increases. Because the reliability of disparity decreases as viewing distance increases, we expect the texture weight to increase as distance increases. As in the previous studies, we determined the reliability of the individual cues with 2IFC discrimination experiments. Then, we measured the apparent slant and slant discrimination performance for two-cue stimuli. As we shall see, the MLE cue-combination predictions based on the single-cue experiments (Equations 1, 2, and 6) were largely in accord with the data from the two-cue experiment.
Methods
Subjects
Four observers participated. Two were not aware of the experimental hypotheses (ACD and RM). All had normal stereopsis and did not manifest eye misalignment in normal viewing situations.
Apparatus
All stimuli were displayed on a custom-designed stereoscope with two mirrors and two CRTs (one for each eye; see Backus, Banks, van Ee, & Crowell, 1999). Each mirror and CRT was attached to an arm that rotated about a vertical axis passing through the eye's center of rotation. With this arrangement, the eye and stereoscope arm rotate on a common axis, so when we change the vergence distance, the mapping between the stimulus array and the retina is unaltered (for fixed accommodation).
We used anti-aliasing to specify dot position to subpixel accuracy. To ensure accurate reproduction of visual direction, we spatially calibrated each CRT to eliminate distortions in the images (for details, see Backus et al., 1999).
The observer's head position was stabilized using a bite bar fastened to an adjustable mount. Each observer had a personal mount so that the vertical axes of rotation of left and right eyes were collinear with the rotation axes of the two stereoscope arms (for details, see Hillis & Banks, 2001). The optical distance between the center of rotation of each eye and the face of the CRT was 40 cm.
Stimuli
Stimuli were virtual planes slanted about a vertical axis (i.e., tilt = 0 deg). We independently manipulated two cues to slant: disparity and texture. In single-cue measurements, we isolated one or the other of the two cues. In two-cue measurements, both cues were informative, but could have different slant values. Viewing distance was 19.1, 57.3, or 171.9 cm. Example stimuli are shown in Figure 1.
fig01.gif
Figure 1. Examples of the stimuli. Cross fuse or divergently fuse to see the appropriate slants. The upper stimulus is an example of the disparity-alone stimulus. It has a negative slant (right side near). The lower row provides examples of the texture-alone stimulus when viewed monocularly and the disparity-texture stimulus when viewed binocularly. The disparity- and texture-specified slants are positive (right side far).
The texture cue was the perspective projection of planar patches textured with Voronoi patterns with an average of 64 Voronoi cells per patch (de Berg, van Kreveld, Overmars, & Schwarzkopf, 2000; Figure 1, bottom panel). The actual number of cells varied from trial to trial depending on the randomly selected width of the patch (i.e., cells with a constant average area filled the area of the elliptical patch). Voronoi patterns were generated from a jittered grid of dots. On a frontoparallel plane, a regular grid of points was defined. Then, each point on the dot grid was perturbed horizontally and vertically (uniform distribution from 0.3 to 0.3 deg). The Voronoi pattern defined by these points was then computed. Finally, the resulting textured plane was rotated by an amount equal to the texture-defined slant. To isolate the texture cue, the stimuli were viewed monocularly. The visible portion of the plane was elliptical with a height of 15 deg. The width on each presentation was randomly chosen from a uniform distribution from 15 to 20 deg when the stimulus was frontoparallel. The stimulus was then rotated to the appropriate slant. Thus, the retinal shape of the stimulus outline was an unreliable cue to slant.
The disparity cue to slant was the difference between left- and right-eye projections (calculated for each observer's interpupillary distance). To isolate the disparity cue, the stimulus was defined by sparse random dots (Figure 1, top panel). Each stimulus consisted of 64 dots, with positions randomly drawn from a uniform distribution (note that the texture gradient specified by the dots was therefore consistent with a frontoparallel plane). Dot density was ~0.3 dots/deg2.
When both cues were present, disparity and texture could be consistent (eq73.gif) or they could be in conflict. In the no-conflict case, homogeneous Voronoi-textured surfaces were projected directly to the two eyes. In cue-conflict cases, we first calculated a perspective projection of the texture with slant eq74.gif at the Cyclopean eye (Figure 2, left panel). We then found the intersections of rays through this Cyclopean projection with a surface patch at the disparity-specified slant eq75.gif (Figure 2, middle panel). The markings on this latter surface were then projected to the left and right eyes to form the two monocular images (Figure 2, right panel).
fig02.gif
Figure 2. Creation of the cue-conflict stimuli. Left: Perspective projection of a homogeneously textured surface with the Cyclopean eye as the center of projection. This projection creates the texture-specified slant, eq74.gif. The rays from the surface toward the eye are used in the next step. Middle: A virtual surface with the disparity-specified slant, eq75.gif, is created. The rays from the first step are back-projected from the Cyclopean eye to find their intersections with the disparity-defined surface. They are marked in the diagram with black points. Right: Viewing the black points binocularly yields the cue-conflict stimulus containing the texture-specified slant in the left panel and the disparity-specified slant in the middle panel.
Control experiments and procedures to validate single-cue measurements
We went to some lengths to ensure that the single-cue experiments measured the variances of the disparity and texture estimators in a fashion appropriate for making two-cue predictions. In this section, we describe control experiments and methodological procedures used to achieve that goal.
1. Are disparity-alone measurements affected by monocular slant signals?
To make sure that only binocular information determined slant discrimination in the disparity-alone case, we conducted two control experiments.
First, to make sure that the stimulus did not provide a monocular cue to slant, we measured monocular slant-discrimination thresholds at various slants for the 64-dot stimulus. Observers could not reliably discriminate anything but large slant changes, and those changes were at least a factor of 10 larger than the thresholds in the disparity-alone experiment. We conclude that there is no useful monocular slant information in the 64-dot random-dot stimulus.
Second, we wanted to make sure that we presented enough dots in the display for disparity-based thresholds to be as low as possible while still isolating the disparity estimator. The details of this control experiment and the results are provided in Appendix A. We found that threshold decreased as dot number increased from 2 to 32 and then leveled off beyond 32 dots. With 64 dots, disparity-based thresholds were as low as they could be. The results were simpler for observer JMH than for ACD: ACD may have given some weight to the texture signal at base slants different from 0 deg. We will return to this point when we discuss her two-cue data (in Discussion: Summary of results).
2. Are disparity-alone measurements based on perceived slant or on only the disparity gradient?
To combine two cues for slant, the cues must be promoted to the same units. Disparity signals alone do not provide a slant estimate because they must be scaled or normalized for distance (Gårding, Porrill, Mayhew, & Frisby, 1995). We were concerned that observers might perform the slant-discrimination task in the single-cue, disparity-alone case by comparing only the disparity gradients in the two stimulus intervals. Said another way, they might perform the task without normalizing the disparity signals into slant estimates. To test the MLE model, we must acquire valid measures of the reliabilities of single-cue slant estimates. In the disparity-alone condition, this means our measure must reflect the process of scaling the disparity signal into units of slant. If the task in the disparity-alone condition were done without normalizing the disparity signal, the psychometric data would not reflect errors introduced by the scaling process (which, within the framework of weighted-linear cue combination, is essential for combining disparity and texture signals), and we would underestimate the variance of disparity-based slant estimates. We therefore looked for evidence that observers scale the disparity signal for distance in a discrimination task with our disparity-alone stimuli. Observers performed the slant-discrimination task with the disparity-alone stimulus, but with the comparison stimuli appearing at different distances relative to the standard stimulus. We found that most observers (importantly, JMH and ACD) take distance into account when performing the slant-discrimination task; that is, they do not perform the task by only comparing the disparity gradients in the two stimulus intervals. The details of the experiment and results are described in Appendix B. The results of this control experiment support the assumption that the disparity-alone measurements provide valid estimates of the reliability of the disparity-based slant estimator.
3. Are the estimates of the single-cue reliabilities valid for the two-cue experiment?
The single-cue data were used to specify the model's parameters (single-cue variances) and the model was then used to predict the two-cue data. An important assumption is that the appropriate variances are being measured in the single-cue experiments. One might question this assumption for the measurements of the disparity estimator's reliability because a different type of stimulus was used in the single-cue, disparity-alone condition (random-dot stimuli) than in the two-cue condition (Voronoi stimuli). This concern cannot be addressed by using Voronoi stimuli in the disparity-alone condition because such stimuli provide salient texture cues to slant. We can, however, check the validity of using random-dot stimuli by comparing single-cue thresholds with those stimuli to two-cue thresholds when the texture weight is expected to be approximately zero. This check, described in Results: Just-noticeable differences, confirmed the validity of our assumption for observer JMH (the only observer for whom the required data were available).
A similar concern can be raised about using monocular stimuli in the texture-alone condition to measure texture reliability in the two-cue experiment. The stimulus in the two-cue experiment is binocular, so the visual system receives two samples while there is only one sample in the texture-alone experiment. The two samples will not be the same because the texture-specified slant at the left eye necessarily differs from the texture slant at the right eye (see Appendix D). Thus, the visual system must integrate two texture-gradient signals into one binocular estimate before combining with the slant estimated from disparity. The presence of two samples in the two-cue experiment might reduce the uncertainty associated with the texture cue in a fashion similar to the reduction in contrast threshold with binocular viewing (Legge, 1984). We can check this by comparing the monocular single-cue thresholds to binocular two-cue thresholds when the texture weight is expected to be approximately one. This check, described in Results: Just-noticeable differences, confirmed the validity of our assumption that the monocular measurements were a valid estimate of the variance of the texture estimator in the two-cue experiment.
4. Do unmodeled slant cues affect responses?
We wanted to isolate the cues of disparity and texture, so we had to consider whether other slant cues might be present in the display. Three cues—the blur gradient, accommodation, and the phosphor grid of the CRTs—always signaled a slant of 0 deg. If the observer failed to ignore those conflicting cues, the variances we measured would be higher than the true variances associated with disparity and texture. To reduce the salience of all three cues, we placed diffusers on the faces of the CRTs to blur the stimuli slightly. Blurring the stimuli decreases the probability that the observers used the blur gradient because the blur gradient is a less reliable depth cue with blurred as opposed to sharply focused stimuli (Mather & Smith, 2002). Blurring should also decrease the probability that accommodation was used as a depth cue because humans accommodate inaccurately if at all to blurred stimuli (Heath, 1956). The diffusers also made the phosphor grid invisible.
Procedure: Single-cue conditions
To estimate the reliabilities of the texture- and disparity-based slant estimates, we obtained psychometric functions for texture and disparity presented in isolation at several base slants (±70, ±60, ±45, ±30, ±15, and 0 deg) and distances (19.1, 57.3, and 171.9 cm) for two observers (ACD and JMH). The other two observers participated in a subset of these conditions. We used a 2IFC task with no feedback. On each trial, the observer indicated which of two stimuli—one at the base slant and the other at the base slant eq76.gif —had the greater apparent slant. The stimuli were displayed for 1.5 s with a 0.3-s interstimulus interval. We used staircases to control the value of eq77.gif and four reversal rules—3-down/1-up, 1-down/3-up, 2-down/1-up, and 1-down/2-up—to sample points along the entire psychometric function. At least eight staircases were employed for each psychometric function for ACD and JMH, which corresponds to approximately 350-450 trials per function (each staircase was terminated after 12 reversals). At least two, but typically more, staircases were employed for RM and MSB. In each session, at least four interleaved staircases were run: two base slants (one positive and one negative to avoid adaptation) with two staircases each. Viewing distance was fixed in each session.
Procedure: Two-cue conditions
The procedure in the two-cue conditions was the same as in the single-cue conditions except that a no-conflict stimulus (disparity- and texture-specified slants equal to one another) and a conflict stimulus (disparity and texture slants not necessarily equal) were presented on each trial. Figure 3 depicts the disparity- and texture-defined slants of the no-conflict and conflict stimuli. In both panels, the slant specified by the disparity cue is plotted on the abscissa and the slant specified by the texture cue on the ordinate. The conflict stimulus had one cue set to a base slant (eq78.gif= ±60, ±30, or 0 deg) and the other cue was perturbed. The left panel depicts the conflict stimuli when disparity was perturbed and the right panel shows the stimuli when texture was perturbed. The perturbed cue had incremental slants of ±10, ±5, or 0 deg relative to the unperturbed cue, so the conflict was always small. In previous work with quite similar stimuli, a difference of 10 deg between the disparity- and texture-specified slants was generally not detectable (Hillis, Ernst, Banks, & Landy, 2002). The five possible perturbed cue values are represented along the abscissa and ordinate in the left and right panels, respectively. On each trial, a conflict stimulus and a no-conflict stimulus were presented and the observer indicated the one containing the apparently greater slant. No feedback was given. The value of the no-conflict stimulus was varied according to staircase procedures to map out the psychometric function. At least four staircases were run per experimental session: two conflict conditions for each of two base slants.
fig03.gif
Figure 3. Depiction of the stimulus values and their manipulation in the two-cue experiment. Both graphs plot the disparity-specified slant (eq75.gif) on the abscissa and the texture-specified slant (eq74.gif) on the ordinate for the conflict and no-conflict stimuli. The base slant is represented by the origins (eq5.gif). The conflict stimulus either had the disparity-specified slant perturbed from the base slant by Δ (depicted in the left panel) or the texture-specified slant perturbed by Δ (right panel). Five different conflicts were presented for each base slant and those are represented by the blue circles in the left panel and gray diamonds in the right panel. In the no-conflict stimulus, the disparity- and texture-specified slants were equal to one another. The staircase procedure varied the increments added to the base slant: eq6.gif.
Figure 3 also shows how we determined the point of subjective equality (PSE), the value of the no-conflict stimulus with the same average perceived slant as the conflict stimulus. The enlarged bold symbols—the dark-blue circle in the left panel and black diamond in the right—represent two particular conflict stimuli. Δ represents the incremental slant of the perturbed cue in the conflict stimulus, and δ represents the increment given to the no-conflict stimulus as the staircase procedure varies its slant. As δ is increased and thereby the slant of the no-conflict stimulus is increased (represented in the figure by displacement up along the main diagonal where eq79.gif), the observer will be increasingly likely to report that it had greater slant than the conflict stimulus. At some value of δ, the no-conflict stimulus will on average have the same apparent slant as the conflict stimulus; this is the PSE. If the cue weights are constant across small variations in slant, we can determine the weights from this value of δ.
Consider first the conflict stimulus. From Equations 1-2 and the fact that disparity-defined stimulus slant eq80.gif (where eq78.gif is the base slant), the expected value of the estimated slant of the conflict stimulus is
eq82.gif(7)
Now consider the no-conflict stimulus. From Equations 1-2 and the fact that eq83.gif, the expected value of the estimated slant of the no-conflict stimulus is
eq84.gif(8)
The conflict and no-conflict stimuli will have the same perceived slants when eq85.gif and from Equations 7 and 8, we have
eq86.gif(9)
Thus, the two-cue experiment yields an estimate of the PSE from which we can determine the weights given to disparity and texture. The assumption that the weights are constant for even small variations is inconsistent with statistically optimal slant estimation, in which the weights vary as a function of slant. However, given the precision of our measurements and the rate of change of cue reliability, the fixed local weight assumption provides a reasonable approximation.
We can also plot the percentage of judgments for which the no-conflict stimulus appeared to have greater slant as a function of its slant. The slopes of such psychometric functions index the discriminability of the stimuli (discussed below in Results: Just-noticeable differences).
Results
Specifying the predictions
To quantify the predictions of the MLE model, we need estimates of the variances of the single-cue estimators (eq48.gif and eq49.gif, or equivalently, the reliabilities eq52.gif and eq53.gif in Equations 2 and 6). To estimate these variances, we fit the psychometric data with a cumulative Gaussian using a maximum-likelihood criterion. The standard deviations of the resulting functions were divided by eq91.gif (because the psychophysical procedure was 2IFC) to yield estimates of the standard deviations of the underlying slant estimators (Green & Swets, 1974). We call these just-noticeable differences (JNDs) because they represent the slant difference that is correctly discriminated ~76% of the time.
Figure 4 shows the JND estimates for JMH and ACD (JNDs for RM and MSB, whose performance was tested at only one distance, were similar to the those shown here and are plotted in Figure 1S). Each row of panels represents data from one observer. The left column shows the texture-alone data: JNDs in units of slant are plotted as a function of the absolute value of base slant (there was no apparent difference in the results for positive and negative slants). Different symbols represent data from different viewing distances. As expected, texture JNDs did not vary systematically with distance. Also as expected (Knill, 1998a), texture JNDs decreased as the absolute value of slant increased.
fig04.gif
Figure 4. Just-noticeable slant differences (JNDs) for the single-cue experiment. JNDs are plotted as a function of slant or horizontal disparity. Different symbol colors represent data for different viewing distances: 19.1, 57.3, and 171.9 cm. Left and middle columns: JNDs for texture-alone and disparity-alone, respectively, as a function of the absolute value of the slant. Error bars are 95% confidence intervals. Right column: JNDs for disparity alone plotted in terms of HSR. The ordinate is the difference between the absolute values of the natural log of HSR for the base slant and the just-noticeably different slant. The abscissa is the absolute value of the natural log of HSR of the base slant. Error bars are 95% confidence intervals. The lines in the left column (texture-alone) and right column (disparity-alone) are maximum-likelihood curve fits to the data (eq7.gif). The break in the curve and the upward pointing arrow in JMH's disparity fit indicates that JNDs go to infinity somewhere between ln(HSR) of 0.58 and 0.97 (slants of 60 and 70 deg at 19.1 cm).
The middle column of Figure 4 shows the data for the disparity-alone condition: Slant JNDs are plotted as a function of the absolute value of base slant (there were no systematic differences for positive and negative slants). As expected from the viewing geometry (Equations 1 and 2 in Backus et al., 1999), disparity slant JNDs increased systematically with an increase in viewing distance. JNDs also tended to decrease with base slant at the medium and far viewing distances (see also Knill & Saunders, 2003). At the near viewing distance, JNDs tended to increase with base slant. In fact, as indicated by the symbol with the yellow star, JMH's thresholds were infinite at base slants of ±70° and a viewing distance of 19.1 cm because the binocular images could not be fused in this condition. This difference in the trend between near and far distances can be understood in terms of the retinal signal to slant.
The right column of Figure 4 plots the same data as the middle column but in units of relative disparity: specifically, the horizontal-size ratio (HSR) (Backus et al., 1999). eq92.gif, where eq93.gif and eq94.gif are the horizontal angles subtended by a surface patch in the left and right eyes. Plotted in these units, JNDs do not vary systematically as a function of viewing distance. This implies that the increase in slant-discrimination threshold is caused only by the geometric relationship between distance and disparity and not by greater error in the calculation of disparity nor by greater error in estimates used to scale for distance (such as vergence; Equation 2 in Backus et al., 1999). JNDs plotted in these units increase with increasing eq95.gif. This increase may reflect difficulties in solving the binocular-matching problem as the disparity gradient (which is linearly related to HSR) increases (Banks, Gepshtein, & Landy, 2004; Burt & Julesz, 1980). The increase may also reflect the fact that surfaces with large eq95.gif contain fewer points near the Vieth-Müller Circle where stereoacuity is highest. eq95.gif increases more rapidly as a function of slant at near distances (indicated by the fact that JMH's data at high base HSRs all come from the near viewing distance). For example, a change in slant from 60 to 70 deg results in a change in eq98.gif from 0.58 to 0.97 at 19.1 cm and from 0.06 to 0.1 at 171.9 cm. (We did not plot the point at 70 deg, eq98.gif= 0.97, because thresholds were infinite.) We will return to a discussion of the effects of distance and base slant in Discussion: Comparison of observed and expected effects of slant and distance on disparity- and texture-based JNDs.
To make predictions for the two-cue conditions, we needed estimates of the variances of the disparity and texture estimators at slants between the ones for which we have measurements. For the interpolation, we fit smooth curves to the data (eq100.gif, where x is slant and JND is in deg [texture], or x is eq101.gif and JND is eq102.gif [disparity], and α and β are free parameters). This was done by performing a maximum-likelihood fit to all of the raw psychometric data for a given condition (texture or disparity), varying α and β. The curves and the data are shown in the left and right columns of Figure 4. The curve fits represent a fit to the data at all three viewing distances. Thus, they give us a way to estimate disparity and texture reliability between slants where we have measurements, and they also allow us to interpolate across distance. While the reliability of the disparity cue to slant, HSR, does not vary systematically with distance, the relationship between HSR and slant varies significantly with viewing distance. Figure 5 shows how the reliability of disparity slant estimates varies with slant and distance, based on the curve fits to JMH's data. The reliability of the disparity cue to slant decreases as distance increases and the reliability of the disparity cue varies with base slant in different ways at different distances. At near distances the disparity cue is more reliable than the texture cue (and hence, should be given more weight according to the MLE model).
fig05.gif
Figure 5. JNDs of texture (orange) and disparity (blue) cues across distance and slant estimated from curve fits to JMH's single-cue data (Figure 4).
Given a pair of JND values for texture and disparity, we can use Equation 2 to calculate optimal weights. Predicted weights determined solely from the standard deviations of cumulative Gaussians fitted to single-cue psychometric data are shown in Figure 6 as data points (based on the raw JND data) and curves (based on the fitted curves in Figure 4). (Similar plots for the other two observers are shown in Figure 2S. The filled circles and blue curves are the predicted disparity weights and the unfilled diamonds and gray curves are the predicted texture weights. Because JMH's thresholds were infinite at 70 deg in the disparity-alone condition at 19.1 cm, the MLE weight given to disparity in this condition is 0. The curve used to fit JMH's disparity data in Figure 4 does not capture this fact. To incorporate this fact, we smoothly extrapolated the predicted weights curve so that the disparity weight reached 0 at 70 deg.
fig06.gif
Figure 6. Predicted weights for disparity and texture cues. From left to right, the panels show data from viewing distances of 19.1, 57.3, and 171.9 cm. The weights were calculated using Equation 2 and the single-cue discrimination data and curve fits shown in Figure 4. Unfilled diamonds are predicted weights for the texture cue and filled circles are predicted weights for the disparity cue. The solid lines are predictions calculated from the curve fits in Figure 4. Error bars are 95% confidence intervals.
The predicted weights exhibit two trends. First, with increasing slant, the texture weight increases and the disparity weight decreases (data from RM and MSB showed the same trend; Figure 2S) The reciprocal relationship between the texture and disparity weights occurs because the weights are constrained to sum to 1. The texture weight becomes relatively greater than the disparity weight with increasing slant because it becomes a relatively more reliable estimate (Figure 4). Knill and Saunders (2003) observed a similar effect. Second, with increasing distance, disparity weight decreases (and texture weight increases). Although the reliability of the texture estimator does not change with distance, its relative reliability increases because the reliability of the disparity estimator decreases. Individual differences in disparity and texture estimators (i.e., the single-cue data, Figure 4) are manifest in their predicted weights, a point we will discuss later.
Points of subjective equality (PSEs)
From the two-cue data, we can derive the weights the observers actually gave the disparity- and texture-specified slants. Figures 3 and 7 illustrate how this was done. The left panel of Figure 7 shows one observer's psychometric data for a base slant of 0 deg and viewing distance of 57.3 cm. It plots the proportion of trials on which the observer indicated that the no-conflict stimulus appeared to have greater slant (right side farther away) than the conflict stimulus. Psychometric data from four cue-conflict conditions are shown. Unfilled diamonds represent data for which the disparity-specified slant was 0 deg and the texture-specified slant was -10 (gray) or +10 deg (black). Filled circles represent data when the texture slant was 0 deg and the disparity slant was -10 (light blue) or +10 deg (dark blue). It is readily apparent that the texture and disparity cues both affected perceived slant because perturbing the texture-specified slant affected judgments (shown by the separation between the gray and black diamonds) and perturbing the disparity-specified slant also affected judgments (the separation between the light and dark blue circles). The effect of disparity perturbation was greater than the effect of texture perturbation, so the weight given to disparity was larger in this condition. PSEs, the no-conflict stimulus values that appeared on average to have the same slant as the conflict stimuli, are indicated by the arrows. The right panel of Figure 7 illustrates how those PSEs were used to determine the empirical weights. If the perturbed cue (texture for the diamonds and disparity for the circles) were the sole determinant of perceived slant (meaning that its weight equaled 1; Equation 1), the PSEs would lie along the diagonal line. If the non-perturbed cue were the sole determinant, the PSEs would fall on the horizontal line. The relative location of the PSE data between these two extremes reflects the weight given to the perturbed cue (Equation 9). In the same format as Figure 7, Figures 8-10 compare PSE data from the two-cue conditions (reflecting the weights observers actually gave to the two cues) with MLE predictions based on the single-cue data (Equation 1).
fig07.gif
Figure 7. Determination of points of subjective equality (PSEs) from two-cue data. Left panel: one observer's results for four cue-conflict stimuli with eq78.gif = 0 deg, Δ = +/-10 deg, and distance = 57.3 cm. The conflict stimuli are: eq75.gif= 0, eq74.gif= -10 deg (unfilled gray diamonds), eq75.gif= 0, eq74.gif= 10 deg (unfilled black diamonds), eq75.gif= -10, eq74.gif= 0 deg (filled light-blue circles), and eq75.gif> = 10, eq74.gif= 0 deg (filled dark-blue circles). Data represent the proportion of times the observer indicated the no-conflict stimulus was more slanted than the conflict stimulus. Staircase data with fewer than four observations at a given value of the no-conflict stimulus have been removed for clarity. Curves are maximum-likelihood fits of cumulative Gaussians (which used all the points including the ones removed for the clarity). The means of the fits are PSEs, the value of the no-conflict stimulus that on average had the same apparent slant as the conflict stimulus. The PSEs for each of the four conflict stimuli are indicated by the arrows. Right panel: PSEs for the four psychometric functions. Values of the no-conflict stimulus (indicated by arrows in left panel) are plotted as a function of the conflict Δ. If perceived slant were determined by one cue only (meaning its weight = 1), the data would lie on the diagonal line labeled “Perturbed cue dominant” when that cue was perturbed and on the horizontal line labeled “Non-perturbed cue dominant” when the other cue was perturbed. Error bars are 95% confidence intervals.
Figure 8 shows the data from observer JMH and Figure 9 the data from ACD. The columns of panels show data, from left to right, for viewing distances of 19.1, 57.3, and 171.9 cm. The rows of panels show data, from top to bottom, for base slants of +60, +30, 0, -30, and -60 deg (indicated by orange numbers). The abscissa in each panel is the value of the perturbed cue's slant in the conflict stimulus and the ordinate is the PSE. Figure 10 shows data for RM and MSB at the 57.3-cm viewing distance. Here the columns of panels correspond to different observers and the rows are the same as in Figures 7 and 8.
fig08.gif
Figure 8. PSE data and predictions for observer JMH. PSE (slant of the no-conflict stimulus perceived on average as the same as conflict stimulus; eq17.gif) is plotted as a function of the value of the perturbed cue in the conflict stimulus (eq18.gif). The left, middle, and right columns are data from viewing distances of 19.1, 57.3, and 171.9 cm. The rows are for base slants (eq78.gif) of –60, –30, 0, 30, and 60 deg. Those base slants are the middle abscissa value in each panel. Blue filled circles are PSEs when the disparity cue was perturbed and black unfilled diamonds are PSEs when the texture cue was perturbed. Blue and gray lines are the predictions based on Equations 1-2 and the curve fits in Figure 4. Error bars are 95% confidence intervals.
fig09.gif
Figure 9. PSE data and predictions for observer ACD. Conventions the same as Figure 8.
fig10.gif
Figure 10. PSE data and predictions for RM and MSB at the 57.3-cm viewing distance. Symbol conventions the same as in Figures 8 and 9.
The blue and gray lines are MLE predictions for the disparity-perturbed and texture-perturbed conditions, respectively. For each conflict stimulus, the reliability for each cue was computed based on the fitted curves in Figure 4. The optimal weights were then computed using Equation 2. These weights, together with the displayed slants for each cue, were combined using Equation 1 to predict the PSE (i.e., the perceived slant for the conflict stimulus). The predictions are curved because the relative reliabilities (and hence the cue weights) change as the perturbation is changed (Hillis et al., 2002). We used a shortcut to generate the prediction curves. Specifically, we used the reliability based on the displayed slant to calculate the weight, rather than the reliability based on the observer's estimate of slant from each cue (which varies from trial to trial). Predictions based on a full Monte Carlo simulation in which weights were calculated separately for each simulated trial were, however, indistinguishable from these.
The agreement between the PSE data and predictions is generally excellent. The two main expected trends are observed in the data: The influence of disparity decreases with increasing distance and with increasing slant. We will discuss exceptions to the close agreement in Discussion: Summary of the results. We also plotted the MLE-predicted and actual weights in a similar format to Figures 8-10. These plots are shown in Figures 3S-5S. These plots show that the weights are generally close to the MLE-predicted weights and that the sums of the weights given to texture and disparity do not differ from one.
Just-noticeable differences (JNDs)
The estimation model for cues with uncorrelated noises (Equations 1-2) produces the least-variable estimate of slant given the available cues. If observers employ this cue-combination scheme, we should see improvements in JNDs when both cues are available compared to when only one cue is available. Equation 6 specifies the variance of the optimal cue-combined estimator, which is lower than either of the single-cue estimators. We used the estimates of JNDs from the single-cue conditions (Figures 4 and 1S) and Equation 6 to calculate the predicted JNDs when both cues were available. Figure 11 shows measured and predicted JNDs for JMH and ACD as a function of base slant for the three distances. The pale symbols represent the single-cue JNDs: diamonds for texture alone and circles for disparity alone. The filled red squares are the observed two-cue JNDs and the shaded red areas contain the 95% confidence intervals for the predictions. With few exceptions (discussed in Summary of results), the two-cue data follow the predictions very closely. Importantly, two-cue JNDs are consistently lower than single-cue JNDs, which shows that the visual system does benefit from having both cues available. Similar JND plots for RM and MSB are shown in Figure 6S.
Earlier we mentioned a test of the assumption that the reliability of the disparity estimator measured in the single-cue experiment with random-dot stimuli is a valid estimate of the estimator's reliability in the two-cue experiment with Voronoi stimuli. We tested the assumption by examining situations in the two-cue experiment in which the texture weight was nearly zero. The texture weight was less than 0.15 in three situations, all with observer JMH: distance = 19.1 cm and base slants of 15, 0, and +15 deg. His two-cue thresholds in those situations were 2.9, 2.4, and 2.9 deg, respectively (Figure 11). His single-cue, disparity-alone thresholds in the same situations were 3.2, 2.6, and 2.1 deg, respectively (Figure 4). The close correspondence supports our assumption that the disparity-alone thresholds provided an estimate of the appropriate reliability for the two-cue experiment.
fig11.gif
Figure 11. Predicted and observed JNDs. The just-noticeable difference in slant (JND) is plotted as a function of base slant eq78.gif. JNDs are the sigma parameters for the cumulative normal fits to the psychometric data divided by eq103.gif and represent our estimates of the standard deviation of the slant estimators. Filled red squares are observed JNDs when texture and disparity were both present. Faint gray diamonds are observed JNDs for texture alone (Figure 4, left) and faint blue circles are observed JNDs for disparity alone (Figure 4, middle). Disparity JNDs for ±70 deg base slant at 19.1 cm for JMH were infinite (indicated by pale blue symbols with yellow stars). Error bars represent 95% confidence intervals. Red curves represent 95% confidence intervals for the predicted JNDs (Equation 6). Left, middle, and right panels represent the data from viewing distances of 19.1, 57.3, and 171.9 cm.
By similar reasoning, we can test the assumption that the reliability of the texture estimator measured in the single-cue experiment with monocular stimuli is a valid estimate of the estimator's reliability in the two-cue experiment with binocular stimuli. To generate one slant estimate from the texture-specified slants at the two eyes, the visual system should combine the monocular signals in some fashion. The combination could occur in two ways. (1) The visual system might combine the two eyes' images before computing slant. This could be done in principle by averaging the visual directions for each corresponding point in the two images. Then slant would be computed from the combined Cyclopean image. (2) The visual system might estimate eye-centered slants before combining. Specifically, it could estimate the slants from the texture signals received by each eye and then average the two estimates. These two means of combining the monocular images are geometrically equivalent and yield the same slant as would be observed at the Cyclopean eye as long as the coordinate origin is on the Vieth-Müller Circle. At any rate, averaging the two eyes' inputs is a reasonable way to form a texture-based slant estimate. If we assume that the two monocular inputs are equally informative and that their noises are uncorrelated (perhaps an implausible assumption), the variance of the combined estimate would be half the variance of either monocular estimate. In other words, discrimination thresholds based on the texture information alone would be lower in the binocular than in the monocular case by eq103.gif (Legge, 1984). We tested this possibility by examining situations in the two-cue experiment in which the disparity weight was nearly zero. This occurred for JMH and ACD across all slants at 171.9 cm. It also occurred for observer JMH at 19.1 cm and base slant = ±70 deg. JMH's texture-alone JNDs at 171.9 cm for base slants of –45 to +45 deg (the range of tested slants) were 2.3–8.0 deg (the lowest values occurring at the greatest slants; Figure 4). His two-cue JNDs at 171.9 cm for base slants of 45 to +45 deg ranged from 3.4–5.9 deg (again the lowest values occurring at the greatest slants; Figure 11). JMH's texture-alone JNDs at 19.1 cm for base slants of 70 and +70 deg were 1.5 and 1.0 deg, respectively, and his corresponding two-cue JNDs were 1.4 and 1.4 deg. ACD's texture-alone JNDs at 171.9 cm ranged from 2.4–5.3 deg and her two-cue JNDs ranged from 3.3–4.1 deg. Thus, when the disparity weight was low, the texture-alone thresholds were generally similar to the corresponding two-cue thresholds. The good correspondence supports our assumption that the texture-alone thresholds provided an estimate of the appropriate reliability for the two-cue experiment. It also implies that the slant specified by texture is not made more reliable by averaging the two eyes' images, perhaps because the noises are highly correlated.
Discussion
Summary of the results
The generally excellent agreement between observed and predicted PSEs and JNDs indicates that humans use a statistically optimal strategy for combining slant information from disparity and texture. There are, however, three cases in which the data deviated from the predictions.
(1) JMH's PSEs in the two-cue condition at 19.1 cm and base slants of 60 and +60 deg (Figure 8). The weight given disparity was lower than predicted when the absolute value of the perturbed-cue slant was greater than 60 deg. In the disparity-alone ±70-deg, 19.1-cm conditions, JMH could not fuse the random dot stimulus (thus, thresholds were infinite). The same was true in the two-cue condition: Slant judgments were made on diplopic images, making the task more complicated. Our model does not consider how depth judgments are made in diplopic conditions. Given this, the discrepancy between observed and two-cue data is understandable.
(2) ACD's JNDs in the two-cue condition for all base slants at 19.1 cm and for the larger base slants at 57.3 cm (Figure 11). Her two-cue thresholds were consistently lower than predicted. Moreover, ACD gave slightly more weight to disparity than predicted for base slants of ±30 and ±60 deg at 57.3 cm (Figure 9). The most obvious explanation for these discrepancies is that the disparity-alone JNDs (Figure 4) overestimated the variance of ACD's disparity estimator in the two-cue experiment. As described in Methods (Figure 2), ACD may have given some weight to the uninformative texture signal in the disparity-alone experiment for nonzero base slants. This would have caused an overestimate of the variance of the disparity estimator whenever the disparity weight was relatively high in the two-cue experiment (which occurs when the viewing distance is 19.1 or 57.3 cm) and whenever the base slant differed significantly from zero.
(3) JMH's and ACD's disparity weights were higher than predicted at 171.9 cm when the base slant was 0 deg. We think this small discrepancy is caused by variation in binocular fusion at long distances. Both observers reported difficulty fusing the random-dot stimulus in the single-cue experiment when the viewing distance was 171.9 cm (perhaps because of the conflict between vergence and accommodation). Thus, their thresholds at 171.9 cm may have slightly overestimated the variance of the disparity estimator at that distance. (ACD also had difficulty fusing the random-dot stimulus at 19.1 cm, which may have contributed to the apparent overestimate of the variance of the disparity estimator as discussed under #2 above.) Both observers found it easier to fuse the Voronoi stimulus at 171.9 cm, presumably because that stimulus provides contours to guide vergence eye movements. The discrepancy is most likely to show up when the base slant is 0 deg because the disparity weight is highest in that case. Thus, this discrepancy between predicted and observed behavior is probably caused by fusion difficulties in the single-cue experiment at the long distance.
The great majority of the data is consistent with the MLE predictions and strongly supports the hypothesis that observers combine the slant cues of disparity and texture in a statistically optimal fashion.
Comparison to other studies
Five studies have examined quantitatively whether cue combination is statistically optimal (Alais & Burr, 2004; Ernst & Banks, 2002; Gepshtein & Banks, 2003; Knill & Saunders, 2003; Landy & Kojima, 2001). In agreement with our results, all five found that combination of cues from different sensory modalities (haptics and vision: Ernst & Banks, 2002; Gepshtein & Banks, 2003; audition and vision: Alais & Burr, 2004) or different visual cues (Knill & Saunders, 2003; Landy & Kojima, 2001) was quite close to MLE predictions.
Knill and Saunders (2003) tested the MLE model for combining texture and disparity cues to surface slant. Their stimuli were slanted about a horizontal axis (tilt = 90 deg). Like us, they took advantage of the fact that the relative reliabilities of texture and disparity vary naturally with viewing geometry. They reported reasonable agreement between observed and predicted behavior. We extended their investigation by examining texture and disparity combination for surfaces slanted about a vertical axis (tilt = 0 deg) at various distances. Our data are similar and dissimilar to Knill and Saunders'. Our texture-alone data exhibited a smaller effect of base slant on JNDs (compare our Figure 4 to their Figure 6). Average texture-alone JNDs in our study were ~8 and ~1.5 deg at 0 and 70 deg, respectively (a ratio of 5.3). The corresponding JNDs in Knill and Saunders were ~40 and ~2 deg (a ratio of 20). The fact that our JNDs were generally lower is undoubtedly because our Voronoi patterns were more regular than Knill and Saunders'. The differing effect of base slant is most likely due to differences in how the angular subtense of the stimuli varied with slant: Ours varied with base slant and theirs was constant. To hold angular size constant, Knill and Saunders added texture elements as slant increased, and this adds progressively more information as slant increases.
We also examined how viewing distance affects the weights assigned to texture and disparity and found that the weight assignment is essentially optimal. This result seems to contradict numerous reports of failures to scale veridically for distance in stereoscopic tasks. For example, Johnston (1991), Johnston, Cumming, and Parker (1993), and Bradshaw, Glennerster, and Rogers (1996) had observers judge the amount of depth in disparity-defined cylinders, spheres, and ridges when presented at different distances. Responses were far from veridical, indicating that depth was overestimated at near and underestimated at far distances. How could we observe optimal weight changes as a function of distance, while previous work showed apparent failures to take distance into account? We think the answer lies in the influence of unmodeled cues. In all but one of the previous experiments (Experiment 1 in Johnston et al., 1993), the texture gradient specified a frontoparallel plane. From our analysis, one would expect observers to report seeing less depth at long distances, not because they failed to take distance into account, but rather because they gave increasing weight to a signal specifying that the stimulus is flat. This claim is supported by the observation that making the texture gradient consistent with the disparity-specified shape generally makes judgments more veridical (Buckley & Frisby, 1993; Johnston et al., 1993). Furthermore, when the task is to adjust the shape of a surface until it appears planar and thereby consistent with the texture-specified shape, observers seem to take distance into account veridically (Rogers & Bradshaw, 1995).
If distance was not taken into account in scaling disparities, it is possible that this mis-scaling could be mistaken for a change in disparity weight with change in distance. This no-scaling hypothesis is considered and rejected in Appendix C.
Dynamic determination of cue weights
MLE cue combination has the advantage that it produces the least-variable estimate of slant given the available cues. But it requires the observer to choose weights based on the reliability of the cues. In the case of texture, the reliability clearly depends on the slant, which is what the observer is trying to estimate. Thus, the choice of weights must be made dynamically, with the possibility of varying weights from trial to trial (or from location to location within a stimulus, discussed shortly). The model suggests that on each trial the observer makes an estimate of slant from each cue, uses the value of slant for each cue along with other relevant information (“ancillary cues” such as a distance estimate; Landy, Maloney, Johnston, & Young, 1995) to determine that cue's current reliability. The relative reliabilities are then used to determine the cue weights (Equation 2), followed by weighted cue combination (Equation 1). In our experiments, the slant shown to the observer was selected randomly before each trial from the set of possible slants within each block. For performance to approach optimality, the weights must have been determined in a trial-by-trial dynamic fashion. In a previous study we also had clear-cut evidence of weights changing from trial to trial (Hillis et al., 2002). The reader may wonder how such dynamic computation could be accomplished in a biological system without prior knowledge of the likelihood functions associated with each slant cue. Ernst and Banks (2002) outlined a plausible neural model that could carry out the computation automatically.
Comparison of observed and expected effects of slant and distance on disparity- and texture-based JNDs
We observed three effects in the single-cue experiments—a large improvement in discrimination threshold with decreasing distance with disparity alone, a small improvement in threshold with increasing slant with disparity alone (see also Knill & Saunders, 2003), and a large improvement in discrimination threshold with increasing slant with texture alone (Knill, 1998b; Knill & Saunders, 2003). Here, we ask whether the three observed effects are expected from the slant information in the stimulus.
When the eyes are in forward gaze, as they were in these experiments, the vergence is
eq104.gif,(10)
where d is viewing distance and i is the inter-ocular distance. Slant from disparity (for tilt = 0) is given to close approximation by
eq105.gif(11)
Thus, errors in the disparity and distance estimates will both yield errors in the estimated slant. We calculated the distribution of slant estimates for different viewing conditions under the assumption that the errors in HSR and μ can be represented by additive, independent noises. Specifically, we conducted a Monte Carlo simulation to determine the standard deviation of slant estimates eq106.gif from Equation 11. The noises were Gaussian with mean = 0. We adjusted the noise standard deviations, eq23.gif and eq22.gif, to obtain simulation JNDs similar to the observed JNDs. The simulation results are displayed in Figure 12. The left panel shows eq106.gif as a function of distance (the curves representing different base slants) and the right panel shows eq106.gif as a function of base slant (the curves representing different distances).
fig12.gif
Figure 12. Results of a simulation of slant from disparity estimation. We used a Monte Carlo simulation to calculate the standard deviation of disparity-based slant estimates (Equation 11) for different viewing conditions. We assumed that error in the slant estimates stemmed from noise in HSR and μ (vergence angle) and that these errors (the variances) were the same for all viewing distances. The noises were additive and Gaussian with mean = 0, and we obtained simulation results for many sets of parameters. The results for eq22.gif = 0.012 and eq23.gif = 0.012 radians, which fit the data reasonably well, are displayed in the figure. Left panel: the standard deviations of slant estimates are plotted as a function of distance. Different curves represent different absolute values of base slant. The circles represent the observed JNDs for observer JMH at the various distances. Different colors represent different absolute values of base slant. Right panel: the standard deviations of slant estimates as a function of base slant. Different curves represent different viewing distances. The circles represent the observed JNDs for observer JMH. Different colors represent different distances.
The standard deviation of the slant estimate, eq106.gif, is roughly proportional to viewing distance for all base slants (left panel). This result is expected from Equation 11 because eq112.gif, so fixed additive noise in μ has an increasing effect with distance. We found that eq106.gif was proportional to distance for a wide range of eq23.gif and eq22.gif; the key assumption is that the noise in disparity normalization is fixed and additive in vergence. The data points in the lower left panel are JNDs from observer JMH; clearly, his discrimination thresholds increased monotonically with increasing distance in much the same way as the simulation. The data from ACD were similar. Thus, the distance effect we observed in the disparity-alone experiment is expected if error in disparity normalization is additive in units of vergence.
The right panel of Figure 12 shows that eq106.gif is inversely related to the absolute value of slant. This relationship was observed for all values except when eq117.gif. The relationship is expected from Equation 11 because eq118.gif, so fixed additive noise in HSR has progressively less effect on eq106.gif as base slant increases. The data points in the lower right panel are JNDs from observer JMH; data were similar for the other three observers. At viewing distances of 57.3 and 171.9 cm, JMH's discrimination thresholds decreased monotonically with slant magnitude much like the simulation's standard deviations.
Thus, the base-slant effect we observed in the disparity-alone condition is expected if error in disparity measurement is additive in HSR. Does this assumption make sense? It does when HSR is not significantly different from 1, which was true for distances of 57.3 and 171.9 cm (see Figure 4). However, when HSR is quite different from 1, points on the surface fall where stereo-acuity is low and problems arise in solving the binocular correspondence problem (Burt & Julesz, 1980). HSR and the horizontal gradient of horizontal disparity are closely related,
eq120.gif,(12)
where DG is an approximation to the disparity gradient (Howard & Rogers, 2002). From Equations 10 and 11 when d is small and S is large, HSR is quite different from 1 and thus DG will be quite different from 0. Burt and Julesz (1980) and others have shown that binocular correspondence becomes difficult when eq121.gif deviates significantly from 0 and breaks down altogether when eq122.gif. Recent results indicate that this is probably a by-product of a matching process that is similar to cross-correlating the two eyes' images to estimate the disparity in a region of the visual field (Banks et al., 2004).
Figure 13 plots the disparity gradient as a function of slant for the three distances we used. eq121.gifincreases rapidly as a function of slant at the short distance, so we expect performance to be worse at that distance for large slants. JMH's data exhibited this effect. His discrimination thresholds at 19.1 cm increased with slant, which is inconsistent with the assumption that the sole source of error in disparity measurement is additive in HSR (Figure 12, lower-right panel, gray curve). They were higher than predicted for eq124.gif 30 deg which corresponds to a higher disparity gradient (eq125.gif 0.19, HSR eq126.gif 1.2) than occurs at 57.3 and 171.9 cm. Thus, the base-slant effect in the disparity-alone experiment is expected if error in disparity measurement is additive in HSR except when HSR deviates significantly from 1 where problems arise in solving correspondence.
fig13.gif
Figure 13. Disparity gradient as a function of slant at different viewing distances. The disparity gradient was calculated from Equation 12.
We also compared our observed texture-alone thresholds with those expected from the information in the various slant cues associated with the texture gradient. Knill (1998a, 1998b) described ideal observers for slant from texture when presented Voronoi stimuli like the stimuli in our experiments. The stimulus parameters in our experiment differed from those in his modeling and experiments in two ways.
First, the Voronoi patterns in our stimuli were more regular than in his. From this one would expect the texture-gradient cue to be more reliable in our experiment than in Knill's.
Second, the angular subtense of our stimuli varied with slant (even though there was a random element to the angular width so as to make the width an unreliable cue to slant), so the average number of texture elements was constant across slant. To keep the angular subtense constant, Knill added texture elements as slant increased, which adds information. This added information probably explains why his observed and predicted discrimination thresholds varied more with slant than ours did.
Despite the differences in stimulus parameters, it is informative to compare ideal thresholds with our observers' thresholds. The curve in Figure 14 shows the standard deviation of slant estimates from Knill's foreshortening ideal observer (Figure 5A in Knill, 1998a). There is a striking effect of base slant. The data points are JMH's thresholds in the texture-alone experiment; data were similar for the other three observers. The data exhibit a base-slant effect like the ideal observer's, but the effect is smaller in our data for reasons described above. Therefore, the variation we observed in texture-based slant thresholds is by and large expected from the information content of the stimulus.
fig14.gif
Figure 14. Ideal and measured JNDs for slant from texture as a function of base slant. The solid line represents the standard deviations of the slant estimates of the foreshortening ideal observer for Voronoi stimuli (Figure 5A in Knill, 1998a). The diamonds represent discrimination thresholds in the texture-alone condition for observer JMH. Light gray diamonds are thresholds at 19.1 cm, medium gray at 57.3 cm, and dark at 171.9 cm.
We conclude that the effects of distance and slant on JNDs can be expected from the information present in the stimuli. These effects are summarized in Figure 5.
What other variables might affect cue weights?
Presumably, the visual system takes the disparity and texture variances into account across many viewing situations. To do so, however, is complex because many viewing properties will affect the likelihood functions associated with disparity and texture cues. Here we list the most obvious properties and suggest how the relative weights assigned to disparity and texture ought to be affected.
1. Regularity of texture.
The slant information contained in the texture gradient can be divided into three cues: (1) scaling, the change in the projected sizes of texture elements, (2) foreshortening, the change in projected shapes of texture elements, and (3) density, the change in the number of elements per unit area in the projection (Blake et al., 1993; Cutting & Millard, 1984; Knill, 1998a). The reliability of scaling as a slant signal depends on the variation in the sizes of the texture elements on the surface. With greater size variation, the cue's reliability decreases (Knill, 1998b). The reliability of foreshortening depends on the variation in the shapes of the elements on the surface. For regular shapes, like circles, reliability is greater than for irregular shapes, such as ellipses with variable aspect ratios (Knill, 1998b; Young et al., 1993). The reliability of the density cue depends on the number of elements and the regularity of their positioning on the surface. Presumably, many elements placed regularly (i.e., in a grid) yield more reliable estimates than few elements placed randomly. All three cues are affected by the field of view, particularly in the tilt direction, so slant discrimination from texture is more precise with large than with small stimuli (Blake et al., 1993; Knill, 1998b). If the visual system takes the varying reliability of the texture gradient cue into account, all of these stimulus properties will affect the relative weights assigned to disparity- and texture-based signals.
2. Surface tilt.
The direction of slant or tilt affects the amount of perceived slant in stereograms (Howard & Rogers, 2002). The disparity signal for surfaces slanted about a vertical axis (tilt = 0 deg) is the horizontal gradient of horizontal disparities. We have quantified this as the horizontal-size ratio (HSR). The disparity signal for surfaces slanted about a horizontal axis (tilt = 90 deg) is the vertical gradient of horizontal disparities. This disparity pattern is often referred to as horizontal-shear disparity (Banks et al., 2001). Random-element stereograms simulating a slanted plane with tilt = 0 deg generally produce less perceived slant than planes with tilt = 90 deg (Gillam & Ryan, 1992). Similarly, the amount of depth seen in curved disparity-defined surfaces varies with tilt (Buckley & Frisby, 1993). These tilt-dependent variations in perceived depth are called slant anisotropy. The phenomenon is most striking when the texture gradient specifies a frontoparallel plane, as is usually the case with random-element stereograms. The phenomenon is not observed when disparity and texture signal the same depth variation, as occurs with real surfaces (Bradshaw, Hibbard, van der Willigen, Watt, & Simpson, 2002; Buckley & Frisby, 1993). These observations strongly suggest that slant anisotropy is caused by conflicting disparity and texture signals in conventional random-element stereograms. They also suggest that texture is generally given more weight for tilt 0 (as in our experiments) and less weight for tilt 90 (as in Knill & Saunders, 2003). By the argument presented here, this may be due to reduced disparity reliability for tilt 0 than for tilt 90 because there is no obvious reason for the reliability of the monocular texture cue to depend on tilt. There may, however, be differences in the steps required to combine the texture and disparity signals for different tilts. The issues involved in transforming texture-gradient signals into the same coordinates for combination with disparity signals are taken up in Appendix D.
3. Reliability of estimated distance and azimuth.
To estimate slant from the measured disparities, the visual system must “normalize” the disparities with a distance estimate and “correct” the disparities with an azimuth estimate (Gårding et al., 1995). Relaxing the assumption of forward gaze in Equation 11, slant about a vertical axis (tilt = 0) is
eq127.gif(13)
where μ is vergence, and γ is azimuth (the angle between the head's median plane and the Cyclopean line of sight) (Backus et al., 1999). μ is estimated both from extra-retinal signals concerning the eyes' vergence and from the horizontal gradient of vertical disparity (Rogers & Bradshaw, 1995). When vertical disparities are large, as occurs with large stimuli at close range, they are the predominant means for estimating distance. However, when vertical disparities are unreliable because the stimulus is small (Rogers & Bradshaw, 1995), or because the texture contains no horizontal contours (Helmholtz, 1910), the eyes' vergence becomes the predominant means of estimating distance and the accuracy of disparity normalization drops (Rogers & Bradshaw, 1995).
The azimuth γ is used to correct disparities; it is estimated from extra-retinal, eye-position signals and from the magnitude of vertical disparities (Backus et al., 1999). When vertical disparities are large, as occurs with near stimuli subtending a large angle, they are the predominant means of estimating azimuth. When the stimulus is short or when vertical disparities are unmeasurable, eye position becomes the predominant means and the accuracy of disparity correction suffers (Backus et al., 1999).
Similar arguments apply for slant estimation with tilt = 90 deg. In this case, slant around a surface point is
eq128.gif(14)
where μ is again the vergence angle, HSh is horizontal shear disparity (Banks et al., 2001) and τ is the cyclovergence of the eyes (the difference in the eyes' torsion). HSh must be normalized for distance by an estimate of μ and corrected for cyclovergence by an estimate of τ (Banks et al., 2001; Howard & Kaneko, 1994).
For our present purposes, when the viewing situation reduces the reliability of the estimates of the normalizing and/or correcting signals, the disparity estimate will become more variable. This will occur, for example, when the stimulus subtends a small angle, when the surface markings make the measurement of vertical disparity unreliable, and when the stimulus is distant. If the visual system takes such changes into account, the weight given to disparity should decrease in those circumstances.
4. Duration. Van Ee and Erkelens (1998) showed that the slant perceived from disparity-defined planes increases with stimulus duration. Their random-element stereograms contained the texture gradient associated with a frontoparallel plane, so their results are consistent with a model in which the weight given to disparity relative to texture increases over time. Presumably, disparity and texture estimates both become more precise with increases in stimulus duration, but the increase may be slower for disparity. Thus, stimulus duration may also affect the relative weights given to disparity- and texture-based slant estimates.
Are cue weights computed locally?
It is interesting to consider whether the visual system determines one set of weights for each surface or whether the weights are calculated locally. That is, can the weights vary from one patch on a surface to another? If they are calculated locally, there are situations in which a cue-conflict stimulus specifying a plane should appear curved. Here we explain why this should happen and report that the predicted curvature is in fact observed.
The left panel of Figure 15 shows how slant and distance vary with azimuth when the surface is a plane. For the part of the plane that lies straight ahead, the slant is S and the distance is d; for the part on the right, it is
eq129.gif(15)
where γ is the azimuth. The distance to the intersection of the line and plane is
eq130.gif(16)
The left and middle panels of Figure 16 show how eq24.gif and eq26.gif vary with azimuth for different base slants and d = 19.1 cm. Because the local slant and distance vary with azimuth, the statistically optimal weights for the texture and disparity cues should vary with azimuth.
Now consider the cue-conflict stimulus in the middle panel of Figure 15. For rightward gaze (γ< 0), slants eq35.gif and eq36.gif approach zero and distance eq26.gif decreases. Our data (Figure 6) show that texture weight is relatively low when the absolute value of slant is ~0 and distance is short. Thus, if the weights used in combining slant estimates are determined locally, one would expect the texture weight in this situation to be lower on the right than straight ahead. (The changes in local slant and distance with changes in azimuth are unaffected by the direction in which the eyes are looking; they are determined only by the positions of surface points relative to the head. Thus, when we say “on the right” or “straight ahead,” we refer to the head-centered azimuth of a line of sight from the Cyclopean eye and not necessarily the azimuth of fixation.) For leftward azimuth (γ > 0), the slants become increasingly negative and distance increases; the texture weight in this situation should be higher on the left than straight ahead.
fig15.gif
Figure 15. Change in slant with azimuth. Left panel: Definitions of local slant and d