| Volume 5, Number 11, Article 1, Pages 901-927 |
doi:10.1167/5.11.1 |
http://journalofvision.org/5/11/1/ |
ISSN 1534-7362 |
All Pulfrich-like illusions can be explained without joint encoding of motion and disparity
Jenny C. A. Read |
Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA |
|
Bruce G. Cumming |
Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, MD, USA |
|
Abstract
In the Pulfrich effect, an interocular time delay results in the perception of depth. Two modified versions, the stroboscopic Pulfrich effect and dynamic visual noise with a delay, are generally explained by postulating an early stage of space/time-inseparable filtering, encoding motion and disparity jointly. However, most disparity sensors in monkey V1 do not show joint motion/disparity encoding, and we recently showed that depth perception in the stroboscopic Pulfrich effect is equally compatible with space/time-separable filtering. Here, we demonstrate that this filtering can be implemented with a population of physiologically plausible energy model units. Similar results are obtained whether the neurons are pure disparity sensors (like most V1 neurons) or joint motion/disparity sensors (like MT). We also demonstrate that the dynamic noise stimulus produces correlations between the activity in pure disparity sensors, and in a separate population of pure motion sensors. These correlations are sufficient to explain the percept. Thus, joint encoding of motion and disparity is not required to explain depth perception in Pulfrich-like stimuli: a brain which encoded motion and disparity in entirely separate neuronal pathways could still experience all of these illusions.
 |
|
History
Received May 25, 2005; published December 19, 2005
Citation
Read, J. C. A. & Cumming, B. G. (2005). All Pulfrich-like illusions can be explained without joint encoding of motion and disparity.
Journal of Vision, 5(11):1, 901-927,
http://journalofvision.org/5/11/1/,
doi:10.1167/5.11.1.
Keywords
binocular vision, computational modeling, interocular delay, primary visual cortex, Pulfrich effect, psychophysics
for related articles by these authors
for papers that cite this paper |
The literature on stereopsis contains several cases in which an illusion of depth is caused by viewing stimuli with an interocular delay, the classic example being the Pulfrich effect (Pulfrich, 1922). When a pendulum is viewed swinging in the frontoparallel plane, the introduction of an interocular delay generates a sensation of depth, making the pendulum appear to follow an elliptical path in depth as it swings to and fro. While significant clinically—for example, patients whose optic neuritis causes a difference in conduction speeds between the optic nerves may experience a disconcerting Pulfrich effect—this can tell us little about how the brain works. As pointed out by Fertsch (Pulfrich, 1922), in the classic Pulfrich effect, the interocular delay introduces a real spatial disparity on the retina. Suppose the image reaching the right eye is delayed relative to the left eye by an amount Δ t, and that the object, moving with speed v, has a position x when it is first seen by the left eye. By the time this same image reaches the right eye, the image in the left eye will have moved to a new position, x + vΔ t. At this moment, the right eye's image is at x whereas the left eye's image is at x + vΔ t, so there is a spatial disparity vΔ t. Because any neuronal mechanism that produces depth from binocular disparity will also produce depth in the classic Pulfrich effect, we learn nothing new about brain mechanisms. In the past thirty years, however, several modifications of this stimulus have been introduced, precisely to elucidate neuronal mechanisms for processing delay and disparity. The two most prominent are the stroboscopic Pulfrich effect (Burr & Ross, 1979; Lee, 1970a, 1970b; Morgan, 1979; Morgan & Thompson, 1975; Read & Cumming, 2005b) and dynamic visual noise (Falk & Williams, 1980; Morgan & Fahle, 2000; Morgan & Tyler, 1995; Morgan & Ward, 1980; Ross, 1974; Tyler, 1974, 1977). In the stroboscopic version of the Pulfrich effect, a target is presented in apparent motion, jumping from point to point across the screen instead of moving continuously, and is viewed with interocular delay. The space/time diagram for this stimulus is shown in Figure 1. The dotted lines represent the trajectory of the moving object; the stars represent its brief appearances. At any instant in time, the stimulus is visible in only one eye, so in this sense the stimulus has no spatial disparity. Yet this stimulus also gives rise to a perception of depth. The second stimulus, dynamic visual noise, resembles the “snowstorm” on an untuned television. When viewed with an interocular delay, the noise appears to swirl in depth, with points in front of the screen moving towards the delayed eye, and points behind it in the opposite direction. In these stimuli, unlike the classic Pulfrich effect, the depth percept is not a trivial consequence of stimulus geometry. For example, in the stroboscopic Pulfrich effect, the target is only ever visible monocularly at a given instant. In order for binocular matches to be made between left and right images, the brain must remember the image seen in the left eye to pair it with the delayed image seen in the right. Thus, this stimulus can tell us about the temporal integration properties of the neuronal mechanisms subserving depth perception (Morgan, 1979). Figure 1. Space/time diagram for the stroboscopic Pulfrich stimulus. The dotted lines show the trajectory of the target (red = left eye; blue = right eye) defined by apparent motion. The “stars” show the appearances of the target, when it is briefly illuminated by the stroboscope. Because the right-eye image is artificially delayed, each appearance of the target occurs a time Δt later in the right eye than in the left eye. T is the interflash interval of the stroboscope; X is the interflash distance. v = X/T is thus the apparent speed of the target. vΔt is the virtual disparity between the apparent-motion trajectories. Work with these stimuli, and similar experiments using vernier alignment tasks, has shown that what is perceived can best be explained by considering the effects of spatiotemporal filtering in early vision, before any attempt to extract information needed for a particular task (Morgan, 1975, 1976, 1979, 1980, 1992; Morgan & Watt, 1982, 1983; Read & Cumming, 2005b). Along with the development of explanations based on spatiotemporal filters, another idea has gained wide acceptance: that this spatiotemporal filtering is performed by direction-selective filters ( inseparable functions of space and time); the logic behind this conclusion is laid out with particular clarity by Anzai, Ohzawa, & Freeman ( 2001). In this view, the receptive fields are tilted relative to the space/time axes, so the neuron is sensitive to stimulus direction of motion (Anzai et al., 2001; Carney, Paradiso, & Freeman, 1989; Morgan & Fahle, 2000; Pack, Born, & Livingstone, 2003; Qian, 1997). Binocular neurons with such receptive fields would jointly encode both motion and disparity. A signature property of such joint motion/disparity sensors is their distinctive tilted tuning profile when probed with stimuli containing both interocular delay and binocular disparity ( Figure 2A). Their preferred disparity changes as a function of interocular delay; they “cannot distinguish an interocular time delay from a binocular disparity” (Qian & Andersen, 1997). Figure 2. Delay/disparity tuning profile for (A) a joint motion/disparity sensor, and (B) a pure disparity sensor. Interocular delay is plotted on the vertical axis; disparity on the horizontal axis. The shaded region shows the combinations of delay and disparity that elicit a strong response from the cell. For a joint motion/disparity sensor, the preferred disparity depends on delay (e.g., with no interocular delay, the cell prefers zero disparity; when the left eye is leading it prefers near disparity; when the right eye is leading it prefers far disparity). For a pure disparity sensor, interocular delay merely weakens the disparity tuning, without altering the preferred disparity (here zero). If stereopsis was supported exclusively by joint motion/disparity sensors, then, as first proposed by Ross ( 1974), interocular delay would produce a depth percept in the same way as binocular disparity does. In stimuli with no interocular delay, a zero-disparity stimulus is presumably perceived as such because it elicits the strongest response in cells tuned to zero disparity. But if the disparity sensors are also sensitive to direction of motion, then when the right eye is delayed, rightwards-preferring cells shift their disparity tuning towards far disparities, and leftwards-preferring cells towards near ( Figure 2A; see Figure 2 of Read & Cumming, 2005a, for an explanation of why the direction of the shift depends on the cell's preferred direction of motion.) Thus, the cells which respond best to the zero-disparity stimulus with interocular delay are not those which usually signal zero disparity in a conventional, nondelayed stimulus. Rather, the most responsive cells are those tuned to leftwards motion and far disparities (whose peak response now occurs at zero disparity, because the interocular delay has shifted the entire disparity tuning curve towards near disparities) and to rightwards motion/near disparities (whose tuning curve has been shifted towards far disparities). Because these cells usually respond only to nonzero disparities, the brain naturally interprets their activity as indicating that the stimulus contains depth: Leftward-moving objects are perceived as “far”, and rightwards ones are perceived as “near”. Thus, if one assumes that the brain is unaware of the interocular delay, and so reads out these filters as if there was no delay, the sensation of depth follows naturally. In recent years, neurons with these properties have been reported in cat area 17/18 (Anzai et al., 2001; Carney et al., 1989) and in monkey V1 and MT (Bradley, Qian, & Andersen, 1995; DeAngelis, Cumming, & Newsome, 1998; DeAngelis & Newsome, 2004; DeAngelis & Uka, 2003; Maunsell & Van Essen, 1983; Pack et al., 2003; Roy, Komatsu, & Wurtz, 1992) and hailed as the neuronal basis for the Pulfrich effect. A mathematical analysis has confirmed that joint motion/disparity sensors can, as expected, give a depth percept in the stroboscopic Pulfrich effect and in dynamic visual noise (Qian & Andersen, 1997). A clear consensus now seems to have emerged in the scientific literature, and even in popular-science books, that joint motion/disparity encoding in early visual cortex is the neuronal basis for Pulfrich-like phenomena (Anzai et al., 2001; Carney et al., 1989; Morgan, 2003; Morgan & Castet, 1995; Morgan & Fahle, 2000; Morgan & Tyler, 1995; Pack et al., 2003; Qian, 1997; Qian & Andersen, 1997). But the physiological observations also raise important questions about this idea. Although joint motion/disparity sensors are common in cat area 17/18, they are much less common in monkey V1 (Pack et al., 2003; Read & Cumming, 2005a). This simply reflects the much lower incidence of direction selectivity in V1: a minority of disparity-selective cells in V1 do also encode direction of motion, as envisaged in the joint encoding theory, but the majority of disparity-selective cells are not sensitive to direction of motion. Thus, the standard view that only joint motion/disparity sensors contribute to Pulfrich illusions leads to a most surprising conclusion. In this view, the illusory depth is signaled only by the small minority of disparity sensors that are also direction selective, whereas the vast majority of non-direction-selective disparity sensors are signaling zero disparity. And yet the veridical perception of the majority is somehow overridden to create the illusory percept. That is, most disparity-selective cells in V1 do not contribute to depth perception, despite encoding useful information about disparity. This could certainly occur if perception depends upon neurons in extrastriate cortex where joint encoding is more common, for example, MT (Bradley et al., 1995; DeAngelis & Newsome, 2004; DeAngelis & Uka, 2003; Maunsell & Van Essen, 1983; Pack et al., 2003; Roy et al., 1992), but would still imply that information available in the population of V1 neurons has been lost. Before drawing this conclusion, it seems worth re-examining whether joint motion/disparity encoding is in fact the only explanation for Pulfrich-like illusions. We shall argue that, in fact, the illusory depth percept in all Pulfrich-like stimuli can be explained perfectly well in terms of pure disparity sensors. We conclude that, in Pulfrich-like illusions, both the direction-selective and the non-direction-selective disparity sensors in V1 are signaling the illusory depth. Thus, there is no evidence that either group has a privileged role in perception. In a recent paper (Read & Cumming, 2005b), we showed that the effective disparity produced by the stroboscopic Pulfrich effect (defined as minus the nulling disparity necessary to cancel the depth illusion) could be quantitatively explained in terms of spatial disparities present in the stimulus, if one allows for temporal integration over about 15 ms. In the model, the effective disparity is simply the weighted average of the disparities between all possible matches, with matches whose left and right members occur further apart in time given less weight. It seemed likely that this disparity-averaging algorithm could be simply implemented by a population of pure disparity sensors, much as Qian & Andersen ( 1997) did with a population of joint motion/disparity sensors. These pure disparity sensors would still implement an early stage of spatiotemporal filtering, as envisaged by Morgan and colleagues (Morgan, 1975, 1976, 1979, 1980, 1992; Morgan & Watt, 1982, 1983). However, we argued that the filters need not be spatiotemporally inseparable and hence direction-selective, as assumed in recent years (Morgan, 2003; Morgan & Castet, 1995; Morgan & Fahle, 2000; Morgan & Tyler, 1995; Qian & Andersen, 1997); but that separable filtering would give similar results. The depth percept could be supported by pure disparity sensors (binocular neurons with spatiotemporally separable receptive fields), whereas the motion percept could be supported by a separate population of motion sensors with spatiotemporally inseparable receptive fields. Such a model, in which motion and disparity are encoded separately rather than jointly, still incorporates an early stage of spatiotemporal filtering and seems likely to be equally compatible with the psychophysical evidence. In this paper, we test this alternative way of implementing spatiotemporal filtering. The stimulus which is most often adduced as compelling evidence for joint motion/disparity encoding is a dynamic visual noise viewed with an interocular delay (Morgan, 2003; Morgan & Fahle, 2000; Morgan & Tyler, 1995; Morgan & Ward, 1980; Ross, 1974). Yet immediately after Ross ( 1974) reported this illusion, Tyler ( 1974, 1977) explained it as due to spatial disparities actually present in the stimulus. Tyler noted that chance pairs of dots that happened to have a far disparity would also tend to have apparent motion to the left, whereas pairs which happened to have a near disparity would move to the right (if the right eye is delayed; vice versa otherwise). He argued that this association is sufficient to explain the swirling percept. In this view, the significance of the temporal delay is that it introduces spatial disparity into the stimulus, according to the geometrical relationship noted by Fertsch (Pulfrich, 1922); it is this disparity, rather than the delay itself, which gives rise to the perception of depth. Perhaps because this explanation was based on matching dots in the stimulus, rather than spatiotemporal filtering, it has been neglected in recent years. Although it seems likely that the explanation could be updated to use spatiotemporal filtering performed by separate populations of disparity and motion sensors (Neill, 1981), no quantitative model demonstrating this has been produced. Consequently, the success of quantitative models based on spatiotemporally inseparable binocular filters has been taken as evidence in favor of joint motion/disparity encoding, in the absence of any clear demonstration that the alternative explanation fails. In this paper, we address both issues by developing a quantitative model of depth perception in Pulfrich-like stimuli, using disparity sensors whose receptive fields are space/time-separable filters. For the stroboscopic Pulfrich stimulus, we consider the response of pure disparity sensors (binocular neurons with space/time-separable receptive fields, built according to the energy model of Ohzawa, DeAngelis, & Freeman, 1990). These neurons are not sensitive to direction of motion, and their preferred disparity remains constant as interocular delay changes, as in Figure 2B. We show that our disparity-averaging model (Read & Cumming, 2005b) can be simply implemented by averaging the response of these neurons, weighted by their disparity preference. This produces a value for effective disparity which is in excellent agreement with psychophysics experiments. For the dynamic visual noise stimulus, we examine the correlation between a population of pure disparity sensors and a population of pure motion sensors (monocular neurons with space/time-inseparable receptive fields). We show that the activity of pure disparity sensors is correlated with the activity of pure motion sensors. If the right eye experiences a delay Δ t, then near-preferring disparity sensors are correlated with rightward-preferring motion sensors, whereas far disparity sensors are correlated with leftward motion sensors. Motion sensors tuned to speed v are most strongly correlated with disparity sensors tuned to a disparity of ~ vΔ t. We argue that this correlation is sufficient to explain why motion is perceived in opposite directions on either side of the fixation plane, why speed increases with distance from fixation, and why the percept reverses when the noise is anti-correlated (Tyler, 1977).
Stroboscopic Pulfrich stimulus The space/time receptive field function ρ(x, t) represents the response to a stimulus at retinal position x that occurred at time t relative to the present moment. We adopt the convention that negative values of t represent times before the present moment. In accordance with causality, we set ρ(x, t) = 0 for all t > 0 because the cell cannot be influenced by the stimuli that have not yet occurred. Because the experimental stimulus contains only horizontal motion, we need include only one spatial dimension. For our model disparity sensors, except where otherwise specified, we use space/time-separable receptive fields, where the function ρ(x, t) can be expressed as the product of a spatial component ρx(x) and a temporal component ρt(t). Neurons with space/time-separable receptive fields are not sensitive to direction of motion. We consider a population of binocular disparity sensors, whose receptive fields are identical in all respects except their positions on the retina. Differences in the position of the receptive fields in left and right eyes result in a range of disparity tuning within the population (position disparity). We take the receptive field centered on the origin, ρ0( x, t), as a template; a receptive field at position xL0, for example, can be written as ρ0( x − xL0, t). Except where otherwise noted, we model the spatial component of the receptive field profile as a Gabor function:  | (1) |
where σ = 0.1° and f = 2 cycles per degree, corresponding to a full-width half-maximum power bandwidth of about 2.3 octaves. In fact, this choice is irrelevant because we prove in the Appendix that, with the read-out rule of Equation 15, the same effective disparity is obtained whatever function is chosen for the spatial component. Except where otherwise stated, the temporal component is modeled as a Gaussian:  | (2) |
where the standard deviation τ is 10 ms, and the time between stimulus onset and peak response, tlag, is 50 ms. This means that the cell's response gradually rises after the appearance of a stimulus, reaching a peak 50 ms after stimulus onset, and decaying thereafter. This receptive field is shown in Figure 3A. A more realistic temporal kernel would be biphasic, reflecting the band-pass temporal tuning of most real V1 cells. However, this would generate problems with the binocular temporal integration. The energy model predicts that the response to interocular delay should be governed by the cross-correlation of the temporal kernels. Band-pass temporal kernels would therefore generate a biphasic response to interocular delay, yet this is not observed in the responses of V1 neurons (Anzai et al., 2001; Read & Cumming, 2005a). This is a known problem of the binocular energy model, which has yet to be addressed. It causes particular difficulties for our model of the strobe Pulfrich effect. Here, the cross-correlation of the temporal kernels also acts as a weight function controlling the weight given to different interocular delays when disparities are averaged (see Appendix). If this weight function is biphasic, then matches at some interocular delays have the effect of repelling the effective disparity away from the disparity of the match, a phenomenon with no psychophysical support. For all these reasons, we restricted ourselves to monophasic temporal kernels when modeling the strobe Pulfrich effect. In the dynamic noise simulation, we do also consider Gabor receptive fields with band-pass temporal frequency tuning. Figure 3. Receptive field profiles used for the strobe Pulfrich simulation. The color plot shows the receptive field function; the marginal plots show its projections on the space and time axes. Only negative times are shown because in our convention positive times represent stimuli which will occur in the future, and so the receptive field function is zero for all t > 0. (A) Standard space/time-separable receptive field, spatial component given by Equation 1 and temporal component by Equation 2 results shown in Figure 8. (B) Tilted (space/time inseparable) receptive field given by Equation 4, results shown in Figure 11. Note different vertical axes. In Figure 10, we show results when the temporal receptive fields are exponential impulse functions:  | (3) |
In Figure 11, we show results when the receptive field is an inseparable function of space and time tuned to the apparent velocity of the strobe stimulus, Figure 3B, and compare these to the results obtained with the space/time-separable receptive field of Figure 3A. For this comparison to be valid, it is essential that the temporal extent of the two receptive fields should be the same. The marginal plots along the top of Figure 3A and B show that the projections of the two receptive fields onto the time axis are the same. However, note that the spatial extent of the tilted receptive field is smaller (note different vertical axis scales in Figure 3A and B). This is because, given the constraint that the temporal projections should be the same, the spatial extent has to be narrow to obtain meaningful velocity tuning. To see this, consider how to increase the spatial extent of the RF in Figure 3B. If the ellipsoid were expanded along all axes equally, the temporal extent would increase along with the spatial extent. If the RF were stretched only along the vertical axis, this would make the ellipsoid more circular and thus weaken the velocity tuning. It might be argued that, for a fair comparison, we should make the spatial extent of the space/time-separable receptive field similarly narrow. In fact, this is not necessary because, as noted above, for space/time-separable receptive fields the results are independent of the spatial component. The receptive field function in Figure 3B is  | (4) |
where tan θ = 3.6 deg/s, the apparent velocity of the strobe stimulus. σ1 = 0.025 and σ2 = 0.008, where t is in seconds and x in degrees. Our disparity sensors are constructed according to the stereo energy model (Ohzawa et al., 1990) with position disparity (e.g., Anzai, Ohzawa, & Freeman, 1997). The energy model was chosen because it is mathematically tractable and enabled us formally to prove the results in the Appendix; however, this choice is not critical. Qualitatively similar results are obtained with, for example, the modified version of the energy model proposed in Read et al. ( 2002). All these models begin with the response of each receptive field at time t:  | (5) |
The function I(x, t) represents the image. I(x, t) is the luminance at retinal position x and time t, relative to the mean luminance. Thus, values of 0 represent gray, positive values represent bright features and negative values represent dark features. The function ρ(x, t) represents the space/time receptive field, as described in the preceding section. Each model neuron has two receptive fields, one in each eye. In the energy model, the neuron's response at time t is the square of the sum of the inputs from the two eyes:  | (6) |
The energy model response can be divided into the sum of the monocular terms, M = vL2 + vR2, which is insensitive to the binocular correlation of the stimulus, and a binocular component  | (7) |
which makes the energy model sensitive to disparity, even in random-dot stereograms. We assume that the effective disparity of the stimulus depends only on the binocular component of the population response. Population response to the stroboscopic Pulfrich stimulus We simulate the response of each neuron, as a function of time, to a stroboscopic Pulfrich stimulus ( Figure 1), in which the left and right images are taken to be  | (8) |
This equation assumes that the moving target is so small and so briefly illuminated that the stimulus may be described as a series of Dirac delta functions, δ. Without loss of generality, we have also assumed that one of the flashes occurs at t = 0, and that the image in the left eye is then at position x = 0. T is the interflash interval of the stroboscope; X is the distance the target moves in this period; Δt is the interocular delay. Positive values of Δt means that the right eye's image is delayed relative to the left eye's; negative Δt means that left is delayed relative to right. Substituting these images ( Equation 8) into Equation 5, we find that the response from the left receptive field is, at time t: After integrating over x and t, this becomes  | (9) |
For simplicity, we have written the summation as over all values of j, although terms with j > t/T, representing appearances of the target which have not yet occurred, make no contribution (recall that the receptive field function is zero for positive values of its time argument). Our model includes a population of neurons distinguished only by the position of their receptive fields on the left and right retinae, xL0 and xR0. The difference between these two defines the preferred disparity Δ xpref = xL0 − xR0, controlling the distance from the observer of stimuli which optimally drive the cell. Their mean value gives the neuron's preferred cyclopean position xpref = ( xL0 + xR0)/2, controlling the visual direction of optimal stimuli. Thus, we can write each neuron's left- and right-eye receptive fields, ρL, ρR, as a shifted version of the reference receptive field ρ0, which is centered on the origin. We write ρL( x, t) = ρ0( x − xL0, t), ρR( x, t) = ρ0( x − xR0, t). In terms of the neuron's preferred disparity Δ xpref and cyclopean position xpref, we have ρL( x, t) = ρ0( x − xpref − Δ xpref/2, t), ρR( x, t) = ρ0( x − xpref + Δ xpref/2, t). Substituting into Equation 9, we find that, for the neuron tuned to disparity Δ xpref and cyclopean position xc, the response from the left eye at time t is  | (10) |
For the right eye, the expression is  | (11) |
We use these expressions in Equation 6 to calculate the response, as a function of time, of a population of binocular neurons tuned to different cyclopean position xpref and disparities Δ xpref, C( t, xpref, Δ xpref). We now need a read-out rule relating the activity of this population to perceptual judgments performed in psychophysics experiments. Extracting a single disparity judgment from population activity In the psychophysical experiments whose results we are seeking to model, the subject was asked to find a disparity that nulled the disparity introduced by the Pulfrich effect (the effective disparity of the Pulfrich effect). A single effective disparity was found for the entire stimulus duration. It is thus natural to assume that, in making this judgment, subjects averaged across time and position. We therefore sum across cyclopean position and time to obtain total activity in the population as a function of disparity only:  | (12) |
A(Δxpref) is the (unnormalized) time-averaged activity of the pool of neurons with preferred disparity Δxpref. Note that, in computing the time average, we need only integrate over one strobe interflash interval T because the integral over cyclopean position is periodic with period T. As the target moves, the activity moves across the population: If at time t the most active cells are those tuned to some particular cyclopean position xpref, then at time t + T the most active cells will be those tuned to xpref + X, but the distribution of activity across sensors tuned to different disparities will be the same. As noted above, the energy model response can be divided into monocular and binocular components M and B ( Equation 7). In this stimulus, when averaged over a population of cells tuned to different cyclopean positions but the same disparity, the sum of the monocular terms is independent of the cells' preferred disparity: It simply indicates the presence of a stimulus somewhere in the visual field. Equation 12 can thus be rewritten as  | (13) |
where  is the “baseline” contribution from the monocular components M, which is independent of the preferred disparity, and the integral represents the contribution from the binocular component B, which does depend on the preferred disparity. It is the binocular component that endows the energy model with its key property of disparity tuning even for stimuli which contain no monocular cues to disparity, such as random-dot patterns; the monocular terms contribute only a baseline response that is observed even with binocularly uncorrelated patterns. In the simpler stimuli considered here (bars), the distinct image features that carry the disparity are visible monocularly. However, the monocular stimulus location gives no reliable information about the disparity of the target, so the binocular component of the response is the only part that is useful for estimating disparity. We therefore examine the disparity-dependent term in Equation 13:  | (14) |
D(Δ xpref) is the amount by which the total response of the pool of neurons tuned to disparity Δ xpref, averaged over time, exceeds the baseline response of all pools. Obviously this will be larger for pools whose preferred disparity, Δ xpref, corresponds to a disparity present in the stimulus. We now wish to choose a neuronal read-out rule that implements disparity averaging, because this is what appears to happen psychophysically. We shall use the set of responses D(Δ xpref) as if it were a probability distribution. For example, if there were two pools whose responses were above baseline, disparity averaging means that the effective disparity lies between the preferred disparities of the two pools. This can be achieved by postulating that the effective disparity is the mean of the disparity distribution implied by D(Δ xpref):  | (15) |
In the Appendix, we show that this read-out rule gives the same results as the weighted disparity averaging considered in Read & Cumming ( 2005b). The time-averaged disparity-dependent activity D ( Equation 14) was evaluated at 151 different values of preferred disparity Δ xpref between ±(4 σ + X(4 σt + |Δ t|)/ T). The resulting distribution D was used to calculate effective disparity as in Equation 15. The limits, notionally infinite, were chosen to make sure of including all neuronal pools whose time-averaged activity is above baseline. The integration limits on xpref were set to ±(4 X + σ), centered on the most active neuronal population, again to make sure of including all the members of the neuronal population which would be activated above baseline during one stimulus temporal period. All integrals were performed by the rectangle rule, using 61 steps in the integral over cyclopean position and 151 steps in the integral over time. The sums in Equations 10 and 11 were evaluated by initially performing the sum from j = −15 to j = 15, and then continuing to add pairs of j on either side of zero until the fractional change was less than 2 parts in a million. To check that these accuracy parameters were fine enough, we redid the simulation using 101 values of disparity, 41 steps in cyclopean position, 101 in time and evaluated the sums in Equations 10 and 11 to an accuracy of 5 parts in a million. The results did not change appreciably. Dynamic visual noise with an interocular delay In the previous section, we were interested in explaining the depth percept, so we modeled only pure disparity sensors, assuming that the motion percept was supported by a separate population of motion sensors that we did not model. In the dynamic visual noise stimulus, we are interested in the relationship between depth and motion to understand why this stimulus produces a sensation of opposite directions of motion on opposite sides of the fixation plane. Here, therefore, we need to include both motion and disparity sensors in the simulation. In our simulation, these populations are entirely separate: We model a population of disparity sensors which are not sensitive to stimulus direction of motion, and a population of motion sensors which are not sensitive to stimulus disparity. Our disparity sensors are binocular energy model units with space/time-separable receptive fields, like those in the previous section. Again, the positions of the receptive field centers differ between left and right eyes, giving a range of disparity tuning. However, there are differences between the simulation needed for the dynamic noise stimulus and that needed for the stroboscopic Pulfrich stimulus. First, the dynamic noise stimulus does not contain a moving target, so there is no need to include neurons tuned to a range of cyclopean positions. For simplicity, therefore, we only consider neurons whose preferred cyclopean position is zero. That is, although neurons with different preferred disparities have receptive fields with different positions, the mean of the receptive field centers in left and right eyes is zero for all neurons. Second, because the dynamic noise stimulus contains motion energy in all directions, we use two spatial dimensions in our simulation. We include neurons tuned to four different orientations, assuming that orientation tuning is always the same in both eyes (Bridge & Cumming, 2001). Spatially, the receptive fields are two-dimensional Gaussians with a long axis of σ1 = 0.06° and a short axis of σ2 = 0.02°, centered on x = Δ xpref/2 in the left eye and x = −Δ xpref/2 in the right. That is, the left-eye receptive field is  | (16) |
For neurons tuned to horizontal orientations, σx = σ1 and σy = σ2, whereas for vertical orientations σx = σ2 and σy = σ1. The right-eye receptive field is similar with Δxpref replaced by −Δxpref. The standard deviation along the temporal axis, σt, is 10 ms. Our motion sensors differ in only two respects from the disparity sensors. First, their receptive fields are inseparable in space and time, making them sensitive to the direction of stimulus motion even when there is no interocular delay. Second, they are monocular, so they cannot sense disparity. In fact, essentially the same results are obtained with binocular motion sensors, which square the input from each eye before combining them. Such binocular motion sensors, although they would be sensitive to the disparity of a stimulus such as a bar (an inevitable consequence of having receptive fields of finite extent in the two eyes), would not be joint motion/disparity sensors in the usual sense because they would not be sensitive to disparity in cyclopean stimuli such as random-dot patterns. However, we used monocular sensors to make it clear that they are not disparity tuned. We include a population of motion sensors tuned to different orientations θpref. The single receptive field, in the left eye only, is centered on the cyclopean location of the disparity sensors' receptive fields, that is, the origin. The motion sensors are tuned to a velocity along an axis orthogonal to their preferred orientation. Thus, sensors tuned to horizontal orientations are tuned to upwards or downwards motion. The receptive field for these sensors is  | (17) |
where and the ± determines whether the sensor prefers upward or downward motion. Similarly, sensors tuned to vertical orientations are σ1 and σ2 are the same as for the disparity sensors: 0.06° and 0.02°, respectively. We show results for v = 5 deg/s( σ3 = 0.004, σ4 = 0.046 in units where time is in seconds and distance in degrees), and v = 10 deg/s( σ3 = 0.002, σ4 = 0.098). Example receptive fields are shown in Figure 4. Because the receptive fields in the dynamic noise simulation are three dimensional, depending on x, y, and t, only slices through the receptive fields can be shown. Figure 4A shows the spatial profile of the receptive field at the moment of the cell's peak response (50 ms after the onset of a stimulus). This is the same for both disparity and motion sensors. The example shown here is tuned to horizontal orientations. Figure 4B– C show the vertical space/time profile of the receptive field, at the retinal position x = 0. Figure 4B is for a disparity sensor. The receptive field is space/time separable, as for the disparity sensor in Figure 3. Figure 4C is for a downward motion sensor. Here, the receptive field is space/time inseparable, meaning that the cell is tuned to a particular speed and direction of motion. The pixelation in this figure reflects the detail with which the receptive fields were sampled in the simulation: receptive field functions were evaluated on a grid of 117 x, 49 y, and 80 t values. Due to the horizontal position disparity, the population of disparity sensors included members whose receptive fields were centered on a range of x positions (whereas the receptive fields for the motion sensors were all centered on x = 0, like the example in Figure 4A). This is why the grid had to extend further in the x direction than in the y direction. The sampling was the same for both x and y: one pixel represented 0.45 arcmin in both directions. Figure 4. Receptive field profile used for the dynamic visual noise simulation. This simulation uses two spatial dimensions and includes receptive fields tuned to a range of orientations. This example shows a horizontally oriented receptive field (θpref = 0°). (A) The spatial profile of the receptive field at the time of its peak response. The axes are horizontal and vertical position on the retina. This spatial profile is the same for both disparity sensors and motion sensors, except that for the disparity sensors, the profile is not necessarily centered on x = 0, as here. The scatter in the horizontal position of the receptive field means that the population includes sensors tuned to a range of horizontal disparities. (B and C) The vertical space/time profile of the receptive field, showing its dependence on time and on vertical retinal position, at the horizontal retinal position x = 0. (B) Separable space/time profile, used for the disparity sensors. (C) Inseparable space/time profile, used for the motion sensors. We also performed simulations using Gabor receptive fields, with bandpass spatial and temporal frequency tuning. Here, the disparity sensors are  | (18) |
(the right eye's is similar with Δ xpref replaced by −Δ xpref), and the motion sensors are  | (19) |
where θpref is the preferred orientation of each neuron and also defines the preferred direction of motion for motion sensors. All neurons, both disparity and motion sensors, had the same tuning to spatial and temporal frequency: f = 2 cycles per degree, v = 10 Hz, σ = 0.1°, σt = 10 ms. The results were essentially the same as those as shown in Figure 12 for the Gaussian receptive fields. Our dynamic visual noise stimulus consists of patterns of 117 × 49 pixels, in which each pixel is colored either black or white at random. As shown in Figure 4, each “time-pixel” in the simulation lasted 1.3 ms. To simulate the patterns used in experiment, a new pattern was generated every 10 time-pixels, corresponding to a simulated video frame of 13 ms. The simulated monitor was assumed to display each frame for exactly one time-pixel. Thus, a receptive field experienced each pattern for 1.3 ms, then experienced 11.4 ms of blank screen before the next pattern appeared. This was not critical to our results; essentially the same results were obtained when the simulated monitor was assumed to display each frame for a full 13 ms. The image presented to the right eye lagged one frame (13 ms) behind the left. A sequence of 50 random patterns was generated over 500 time-pixels, simulating a 633-ms presentation of visual noise. The response of the disparity and motion sensors was calculated at each of the 500 time-pixels. The input vL, vR from each eye's receptive field in response to an image I( x, y, t) was calculated as in Equation 5, with an additional integral over all values of vertical retinal position y. The response of each disparity sensor, CD( t, Δ xpref, θpref), is given by the squared sum of the inputs from the two eyes ( Equation 6), whereas the response of each motion sensor CM( t, θpref) is given by the squared input from the left eye. We then calculated the correlation coefficient r between the 500 responses of the motion sensor tuned to an orientation θpref, and the corresponding responses of disparity sensors tuned to the same orientation θpref and different disparities Δ xpref:  | (20) |
where the bar indicates the average over all times j. A single 633 ms presentation yields curves with the same features as are visible in Figure 12, but with noise. To obtain the smooth curves shown in Figure 12, we repeated this process 500 times and took the average correlation coefficient. The stroboscopic Pulfrich effect The classic Pulfrich effect has traditionally been explained by noting the geometrical equivalence of spatial and temporal disparity. However, this stimulus equivalence does not hold for the stroboscopic version of the effect. It is often argued that in this stimulus there is no spatial disparity in the images presented to the two eyes, only interocular delay (Burr & Ross, 1979; Lee, 1970a; Morgan & Thompson, 1975; Qian, 1997; Qian & Andersen, 1997). Of course, this argument depends critically on the assumption that each appearance of the stimulus in the left eye is paired with that appearance in the right eye, which occurs closest together in time. When the interocular delay Δ t is less than half the strobe period T, then this match has zero spatial disparity. However, even when Δ < T/2, there are other possible matches, separated by longer periods of time, which do contain spatial disparity. We have previously developed a simple quantitative model that, while granting that matches separated by the shortest amount of time have the greatest influence on perception, also allows more widely separated matches to influence perception (Morgan, 1979; Read & Cumming, 2005b; Tyler, 1977). We refer to this as the disparity-averaging model. This model assumes that the disparity assigned to an object is made up of a weighted average of all possible matches between appearances of the target in the left and right eyes. The disparity of each match is weighted by the time delay between the left- and right-eye image in each match, so that matches between appearances which occur at nearly the same time in the two eyes influence perception more than matches between appearances which occur at very different times. The effective disparity in the stroboscopic stimulus is:  | (21) |
where w is the weight function describing how the weight given to a potential match falls off as a function of the interocular delay between the left and right members of the pair, T is the interflash interval of the stroboscope, and X is the distance traveled by the target during this period. When the interflash interval t is short enough, the illumination is effectively continuous, so the strobe Pulfrich stimulus must produce the same depth as the classic Pulfrich effect. That is, the effective disparity becomes equal to vΔ t, the “virtual disparity” between the apparent motion trajectories of the target in the two eyes ( Figure 1; Burr & Ross, 1979). It is easy to verify that Equation 21 satisfies this. As the interflash interval increases, Equation 21 correctly predicts that the effective disparity will fall below the virtual disparity as the interflash interval increases. When the weight function is a Gaussian with mean 0 and standard deviation ~15–20 ms, Equation 21 provides an excellent account of human perception (Read & Cumming, 2005b). This disparity-averaging model ( Equation 21) assumes that the appearances of the strobe stimulus in each eye have been identified and paired. It might therefore appear that it could be implemented only at a very high level, after the stereo correspondence problem has been solved. It turns out, however, that this is not the case. At any one-time separation, there is at most one possible match, so the spatial correspondence problem is trivial. The model of Equation 21 can be very straightforwardly read out from the population activity of disparity-tuned neurons in V1. In the next section, we show how this can be done. We postulate an ensemble of disparity-sensitive units described by the stereo energy model (Ohzawa et al., 1990). Although this model does not capture all aspects of the responses of real disparity-sensitive cells (Read & Cumming, 2003), it has the advantage of mathematical simplicity. Similar results were obtained with the modified energy model units of Read, Parker, & Cumming ( 2002). The neurons in the ensemble are identical in all respects except for the position of their receptive fields on the retina, which gives them different preferences for stimulus disparity, Δ xpref and cyclopean position, xpref. Figure 5 shows how this population responds to one example stroboscopic Pulfrich stimulus, with a negative interocular delay Δ t equal to 40% of the interflash interval. The stimulus is represented by the space/time diagrams along the top row (A–D). Dots indicate appearances of the stimulus in the left (red) and right (blue) eyes. Some of these are labeled for convenience in discussing the stimulus. The four columns show the response of the population at four different times in one period of the stimulus. The current time in each column is indicated by the yellow vertical line in the space/time plots A–D. A small complication is that because the neurons have a temporal lag in their response, they are not driven by the stimulus currently displayed, but rather by the stimulus as it was at previous times. The background of the space/time plot is shaded to show the temporal kernel of the neurons. The darker the shading, the less responsive the neurons are to stimuli at that time. The maximum responsiveness, indicated by the bright region, occurs 50 ms before the current time. Figure 5. The response of a population of disparity-tuned neurons to a stroboscopic Pulfrich stimulus with period four times larger than RF time constant ( T = 4 τ = 40 ms), shown at four different times. The delay is 0.4 times the strobe period (Δ t = 0.4 T = 16 ms). (A–D) Space/time diagrams for this stroboscopic Pulfrich stimulus. The yellow vertical lines show the current time; stimuli to the right of this line have not yet appeared. The background shading shows the temporal kernel of the receptive fields, which is Gaussian with a lag of 50 ms and an SD τ of 10 ms. Dark = least response; white = most responsive. Thus, the center of the bright vertical bar marks the stimulus that is most strongly driving the present response; this is always 50 ms before the present time. (E–H) Instantaneous response of a population of disparity-tuned energy model neurons to this stimulus. Each pixel in the plot represents a neuron; the color is the neuron's current firing rate. The preferred disparity of each neuron is plotted on the horizontal axis, while its preferred cyclopean position is on the vertical axis. The color scale is the same in each plot (black = silent, white = maximal firing). The cross-hairs show the axes. (I–L) Binocular component. As the middle row, except that the color shows only that component of the firing rate which is contributed by the binocular term in the energy model, Equation 7. (M–P) Binocular component averaged over RF cyclopean position. These are the plots in I–L, averaged down each column. The dashed line marks a disparity of one strobe interflash distance X. Note that all times are expressed as a fraction of the strobe interflash interval T and all distances are expressed as a fraction of the strobe interflash distance X. Figure 5E– H shows the response of the population at the different times. Each pixel in the plot represents one neuron; the color shows its current firing rate (black = silent, white = maximal; the color scale is the same for all panels in a row). The neuron's position in the plot indicates its tuning: Preferred disparity is indicated by position along the horizontal axis and preferred cyclopean position by position along the vertical axis. The features of this population response can be understood in relation to the stimulus. Column 1 ( Figure 5AEIM) shows the situation at a time when the target has just made an appearance at x = 0 in the left eye and will shortly make an appearance in the right. However, these appearances have not yet begun to influence the neurons. The neurons are responding optimally to the second-to-last appearance of the target in the left eye (L1, at x = − X, t = − T), as shown by the fact that L1 falls in the middle of the bright band indicating the temporal kernel in the space/time diagram. This appearance has activated all the neurons with a left-eye receptive field close to − X. These neurons lie along a downwards diagonal stripe in the population plots, because the preferred disparity Δ xpref and cyclopean position xpref compatible with a particular left-eye location xL are given by xpref + Δ xpref/2 = xL, which defines a downward diagonal stripe on axes of (Δ xpref, xpref). In Column 2 (BFJN), time has moved on, and the neurons are no longer responding so strongly to L1. However, neurons with a receptive field at x = − X in the right eye are now responding to the most recent appearance of the target in the right eye, R1 (at x = − X and t = −0.6 T; due to the interocular delay, this is 0.4 T later than the corresponding appearance in the left eye). This response shows in Figure 5F as an upward diagonal stripe, xpref − Δ xpref/2 = xR. Naturally, the neurons which are firing most are those whose receptive fields are at x = − X in both eyes because these receive excitation from both eyes. This explains the peak in the population activity at cyclopean position xpref = − X and Δ xpref = 0. Figure 5J shows only the binocular component ( Equation 7) of the cells' response. This has removed the stripes due to monocular activation and focuses attention on the peak. Figure 5N shows the disparity distribution at this moment, that is, the binocular component averaged across cyclopean position. The distribution is symmetric and centered on Δ xpref = 0. In Column 3 (CGKO), the neurons have almost entirely stopped responding to L1, and they have hardly begun yet to respond to L2, so there is no downward diagonal stripe of activity. They are still responding well to R1, so the most prominent feature in the population response is an upwards diagonal stripe, corresponding to the activation of neurons with a receptive field at x = − X in the right eye. Because the input is essentially monocular at this moment, the binocular component in Figure 5K is very weak. However, weak activation is visible at disparities of X and 0, corresponding to the pairings L1↔R1 and L2↔R1, marked with green arrows in Figure 5C. The disparity distribution, Figure 5O, represents the average of these, and peaks at Δ xpref = X/2. In Column 4, the neurons are still responding weakly to R1 and are also beginning to respond weakly to L2. Its other appearances are either too recent to have yet influenced the neurons, or are too far in the past. The only stimulus disparity visible to the population is therefore X, corresponding to the match L2↔R1. This is visible in Figure 5LP, where the binocular component shows a peak for detectors tuned to a disparity of X, not zero. Any sensible read-out rule will therefore predict the perception of a nonzero disparity at this moment. However, this peak at disparity X is weaker than the response at zero disparity in the second column. Consequently, one would expect a single disparity judgment based on this activity over time to lie close to zero than to X. The exact value of the disparity judgment will depend on what rule is used to combine these population distributions over time, which we explore below. Figure 6 shows a similar set of results for a shorter strobe period. The strobe interflash interval T is now 20 ms, which is only twice the SD of the receptive field temporal kernel. The delay is once again 0.4 T, that is, 8 ms. Because the stimulus period is now so much shorter relative to the neurons' temporal integration period, the population response varies very little with time. Instead of seeing strong peaks at zero disparity at some times, and weak peaks at disparity = X at others, as in Figure 5, the long integration time averages these peaks out. That is, the time averaging performed by the spatiotemporal filters themselves is sufficient that the peak in the population activity is nearly constant, located slightly to one side of zero disparity, at a preferred disparity of 0.4 X. It drifts up the vertical axis over time ( Figure 6I– L), reflecting the apparent motion of the target, which stimulates neurons with different preferred cyclopean positions each time it appears. However, the disparity distribution remains constant ( Figure 6M– P). Figure 6. The response of a population of disparity-tuned neurons to a stroboscopic Pulfrich stimulus with T = 2 τ, shown at four different times. As Figure 5, except the strobe period T is 20 ms and the delay is 8 ms. Note that here the peak of the binocular response is at 0.4 X (marked with a dashed line) for all time intervals. The results in Figures 5 and 6 show the activity of a population of disparity detectors at different cyclopean locations, as a function of time. We wish to compare these results with psychophysical data in which subjects provided a single judgment of disparity over a whole trial, during which the moving target appeared at many different locations. To do this we must combine all the disparity signals over space and time to yield a single disparity value. One can think of this as implemented by a higher brain area pooling inputs over time from V1 cells tuned to many different cyclopean positions but the same disparity. This results in a measure of “total support” for each possible stimulus disparity. For simplicity, we assume that this estimate is performed simply by calculating the mean of all responses. To understand the results of this averaging, it is useful to consider the results at each time instant (so the mean is calculated only over all cyclopean positions). Figure 7 shows this average disparity response as a function of time, for the two different strobe periods illustrated in Figures 5 and 6. The color of each pixel in Figure 7 represents the total activity of a pool of V1 neurons tuned to the same disparity, but different cyclopean positions. Each pixel row in Figure 7 shows the same data as one of the panels M–P of Figures 5 and 6, here represented in pseudocolor. Each pixel row represents the distribution of activity across the whole population of disparity detectors at a given time. Each pixel column shows how the total activity in all neurons with a given preferred disparity varies over time. The stimulus is periodic, and so is the steady-state response of the neuronal pools; therefore, only one period is shown. Figure 7. Total activity as a function of time for neuronal pools with different preferred disparities. The pseudocolor shows  : the binocular component of neuronal firing, averaged across neurons with different RF locations xpref, as a function of preferred disparity and time. (A) Strobe interflash interval is four times the neuronal integration period τ, as in Figure 5; (B) strobe interflash interval is 2 τ, as in Figure 6. The blue line traces the peak of the population activity at each moment of time. The straight black line shows the disparity judgment made for the stimulus, extracted from this activity according to Equation 14. For comparison, the virtual disparity vΔ t is indicated with a red arrow. Figure 7A is for the stimulus illustrated in Figure 5, where the strobe interflash interval is four times the time constant of the V1 neurons. The blue line traces the peak, that is, the preferred disparity of the currently most active neuronal pool. For part of each stimulus period, as in Figure 5IJ, there is a strong peak of activity in the neuronal pool tuned to zero disparity. For the rest, there is a weaker peak in the pool of neurons whose preferred disparity is the strobe interflash distance X, as in Figure 5L. To generate a single depth judgment from this population response, we simply take the mean of the whole distribution ( Equation 15, disparity averaging). Thus, for example, in Figure 7A, the time constant of the V1 neurons means that they respond to two of the disparities present in the stimulus: neurons tuned to a disparity of 0 respond strongly, and neurons tuned to a disparity of X respond weakly. (The stimulus contains other possible matches, with disparities of 2 X, 3 X, etc., but the halves of these matches are separated by so long a time that the neurons do not respond to them.) Under our disparity-averaging read-out rule, the effective disparity lies in between the two peaks, but closer to the stronger peak. This is shown with the black line in Figure 7. The effective disparity is thus a weighted average of the disparities present in the stimulus, with the weight depending on the temporal delay between the different possible matches. In Figure 7B, the strobe interflash interval is short relative to the integration time of the neurons, so the most active pool is always the same, namely, the pool with preferred disparity 0.4 X. It can be proved (see Appendix) that taking the mean of the population disparity activity in this way yields the same results as the weighted disparity-averaging equation, Equation 21, with weight function equal to the cross-correlation of the temporal components of the receptive fields in each eye. Figure 8 demonstrates this. The crosses in Figure 8 show the effective disparity obtained by reading out the activity of a simulated neuronal population, for strobe Pulfrich stimuli with different interocular delays Δ t and interflash intervals T. The solid curves show the effective disparity obtained with the original disparity-averaging equation, Equation 21, when the weight function is a Gaussian with SD equal to τ√2 (inset in Figure 8; this is the cross-correlation of the temporal receptive fields, which are Gaussians with SD τ). The results are the same. Thus, the read-out rule presented here represents a simple way to implement the weighted disparity-averaging equation with a population of physiologically plausible model neurons. Figure 8. Effective disparity as a function of interocular delay in a simulation of the stroboscopic Pulfrich effect based on pure disparity sensors. Effective disparity Δ x, as a proportion of the interflash distance X, is plotted as a function of interocular delay Δ t as a proportion of the interflash interval T. Crosses show the results from a population of energy model units, where the temporal component of the receptive field function is a Gaussian with standard deviation τ, and the effective disparity is calculated according to Equation 15. Solid curves are for the model of Equation 21, where the weight function is the auto-correlation of the receptive field temporal kernel, that is, a Gaussian with standard deviation τ√2 (inset). The different colors are for different stimulus interflash intervals (see legend). Effect of different read-out rules The read-out rule discussed so far ( Equation 15) was chosen to implement disparity averaging ( Equation 21), in which the disparities of the possible matches present in the stimulus are averaged after weighting by the interocular delay of each match. Figure 8 shows that this equation can be implemented simply by averaging the output of V1 disparity sensors. But of course, there are many other ways in which population activity in V1 might be processed to arrive at a subjective report of disparity. We have experimented with various read-out rules and found that they generally fall into two classes. Either the effective disparity is generally less than the virtual disparity, as in Figure 8 (except that the precise position of each curve depends on the read-out rule), or the effective disparity is always equal to the virtual disparity (that is, plots of Δ x/ X vs. Δ t/ T lie along the identity line). Figure 9 shows an example of the latter case. In this simulation, instead of averaging V1 activity over time and then extracting a single disparity for the whole stimulus, an instantaneous disparity, Δ xinst, is assigned at every moment based on the preferred disparity of the most active V1 cells (winner takes all). The resulting disparities were then averaged over time. Formally, this rule is  | (22) |
Figure 9. Effective disparity as a function of interocular delay for an alternative read-out rule, in which disparity is extracted by a winner-take-all rule prior to averaging over time ( Equation 22). As in Figure 8, the receptive fields are space/time separable, with Gabor spatial kernels and Gaussian temporal kernels. The instantaneous disparity corresponds to the blue line in Figure 7. Looking at Figure 7A, for example, the read-out rule of Equation 22 notes that the peak of the activity is at zero disparity 60% of the time and at disparity X 40% of the time, and so assigns effective disparity 0.4 X (the virtual disparity, Figure 9). With this read-out rule, the effective disparity is always the virtual disparity, even for long interflash intervals where human subjects report disparities much closer to zero. Although this rule does not therefore match experimental data, it is nevertheless of interest. It demonstrates that it is possible for a population of pure disparity sensors to encode the virtual disparity implicit in the apparent motion of the strobe stimulus, although the sensors do not respond to motion. As we shall see below ( Figure 14), the reason for this paradoxical result is that even pure disparity sensors become sensitive to direction of motion in stimuli with an interocular delay, due to the geometrical equivalence of motion and disparity in such stimuli (Pulfrich, 1922). To recover the sigmoid pattern which is in fact obtained with human observers, we need to take into account that the zero-disparity peak not only lasts longer, but is of larger amplitude. We can modify Equation 20 to achieve this by weighting the instantaneous disparity Δ xinst ( Equation 22) by the height of the peak:  | (23) |
This read-out rule (results not shown) gives results that are similar to those of Equation 15 ( Figure 8): The effective disparity is in general less than the virtual disparity, although the magnitude of the effective disparity is slightly different from the read-out rule of Equation 15. Effect of different receptive field functions The receptive fields used in this model were the product of a Gaussian temporal kernel and a Gabor spatial kernel. What are the effects of different choices |