 |
| Volume 5, Number 3, Article 10, Pages 275-286 |
doi:10.1167/5.3.10 |
http://journalofvision.org/5/3/10/ |
ISSN 1534-7362 |
The dynamics of visual pattern masking in natural scene processing: A magnetoencephalography study
Jochem W. Rieger |
Department of Neurology II, Otto-von-Guericke-University, Magdeburg, Germany, & Max Planck Institute for Biological Cybernetics, Tübingen, Germany |
|
Christoph Braun |
MEG-Center, Eberhard-Karls-University, Tübingen, Germany |
|
Heinrich H. Bülthoff |
Max Planck Institute for Biological Cybernetics, Tübingen, Germany |
|
Karl R. Gegenfurtner |
Department of Psychology, Justus-Liebig-University, Giessen, Germany, & Max Planck Institute for Biological Cybernetics, Tübingen, Germany |
|
Abstract
We investigated the dynamics of natural scene processing and mechanisms of pattern masking in a scene-recognition task. Psychophysical recognition performance and the magnetoencephalogram (MEG) were recorded simultaneously. Photographs of natural scenes were briefly displayed and in the masked condition immediately followed by a pattern mask. Viewing the scenes without masking elicited a transient occipital activation that started approximately 70 ms after the pattern onset, peaked at 110 ms, and ended after 170 ms. When a mask followed the target an additional transient could be reliably identified in the MEG traces. We assessed psychophysical performance levels at different latencies of this transient. Recognition rates were reduced only when the additional activation produced by the pattern mask overlapped with the initial 170 ms of occipital activation from the target. Our results are commensurate with an early cortical locus of pattern masking and indicate that 90 ms of undistorted cortical processing is necessary to reliably recognize a scene. Our data also indicate that as little as 20 ms of undistorted processing is sufficient for above-chance discrimination of a scene from a distracter.
 |
|
History
Received July 22, 2004; published March 31, 2005
Citation
Rieger, J. W., Braun, C., Bülthoff, H. H., & Gegenfurtner, K. R. (2005). The dynamics of visual pattern masking in natural scene processing: A magnetoencephalography study.
Journal of Vision, 5(3):10, 275-286,
http://journalofvision.org/5/3/10/,
doi:10.1167/5.3.10.
Keywords
natural scene, MEG, pattern backward masking, physiology, psychophysics
for related articles by these authors
for papers that cite this paper |
Pattern masking is widely used to study the dynamics of
information processing. The earliest reports of visual masking date back to the
19th century (Baxt, 1871). Since then
masking has been extensively used to study the dynamics of visual perception and
information processing (Gegenfurtner & Rieger, 2000; Grill-Spector, Kushnir, Hendler,
& Malach, 2000; Sperling, 1960) (for reviews, see Breitmeyer &
Ogmen, 2000; Bachmann, 1994; Breitmeyer, 1984; Kahnemann, 1968). However, the neuronal basis of
masking and the effect of the mask on the processing of the target are poorly
understood (Breitmeyer & Ogmen, 2000; Enns & Di Lollo, 2000). One of the first electroencephalography
(EEG)-studies of visual masking, in which the authors attempted to investigate
the dynamics of the target processing, was conducted by Donchin, Wicke, and
Lindsley ( 1963). The authors recorded the
EEG from an occipital electrode while they presented weak flashes of light
followed by strong flashes at the same location, and measured the perceptual
impact of the strong flash on the weak leading flash. They hypothesized that the
activation was simply the arithmetic sum of the separate surface potentials from
the two flashes, and compared the trace produced by the combined stimuli with
the sum of the traces produced when these stimuli were presented alone. For
short target-mask stimulus-onset asynchronies (SOAs), observers were unable to
distinguish the two flashes, and the measured EEG and the synthetic trace were
similar. The authors conclude that at SOAs too short to perceive the leading
flash, “the evoked potentials for the second stimulus displace those for
the first stimulus” and that the visual percept correlates with the
changes in the visually evoked potential (VEP). Schiller and Corover ( 1966) challenged these results. In a
metacontrast masking paradigm, where the mask followed the target at an adjacent
location, they found that the masking VEP did not correlate with the visual
percept. Several subsequent studies have attempted to find some
neurophysiological correlate of the phenomenal experience of simple, masked
target stimuli in humans (e.g., Donchin & Lindsley 1965; Kaitz, Monitz, & Nesher, 1985; Schwartz & Pritchard, 1981; Vaughan & Silverstein, 1968). The results of these EEG-studies
provide no clear picture of the neural correlates of masking and the way the
mask changes the processing of the target. This might be because different kinds
of masking were used that may have different neuronal substrates (Breitmeyer, 1984; Enns & Di Lollo, 2000), or because assumptions about the
additivity of target and mask activity in the EEG-traces were incorrect (Donchin
& Lindsley 1965; Kaitz et al., 1985; Schwartz & Pritchard, 1981).
In addition to human EEG experiments, there are several
recent neurophysiological masking studies in monkeys. Employing metacontrast
masking, Macknik and Livingstone ( 1998)
found a suppression of a late after-discharge in V1 neurons. The authors
interpret this effect as a neuronal correlate of metacontrast masking. Other
groups (Kovacs, Vogels, & Orban, 1995;
Rolls & Tovee, 1994) have recorded from
shape-selective neurons in macaque monkey temporal cortex. Kovacs et al. ( 1995) pattern masked simple shapes (stars,
letters, and gratings) and found a reduction in the shape sensitivity of
inferior temporal (IT) neurons at short target-mask intervals due to a reduced
response duration. Rolls and Tovee ( 1994)
recorded from face selective neurons in the superior temporal sulcus (STS). They
used complex stimuli such as faces or letters as pattern masks. Similar to
Kovacs et al. they found that the backward mask
reduces the firing duration of the neurons. They conclude that the mask
interrupts the processing of the STS neurons. In addition, they found that at
SOAs between 60-100 ms, there are distinct responses to two successive
stimuli.
However, in both studies the locus of the interaction
between the face or shape and the mask remains unclear. Rolls and Tovee ( 1994) argue that the basis of masking could be
either lateral inhibition of face selective neurons by neurons that are
activated by non-face stimuli or reduced information transfer from earlier
visual areas (Rolls, Tovee, & Panzeri, 1999). According to these two hypotheses, the
locus of the interaction could be respectively in temporal cortex or at earlier
processing stages. Keyser and Perrett ( 2002) suggest a third mechanism, competition
between stimuli presented in rapid succession and close spatial proximity.
Although they do not explicitly specify the location where the competition
occurs they seem to assume that the masking of complex stimuli occurs as a
result of competition in higher visual areas such as STS. They predict that
stimuli producing a stronger activation should win the competition with a weaker
stimulus.
Single cell studies of visual masking provide
information in great temporal and anatomical detail, but they are restricted to
looking at one location at a time. Grill-Spector et al. ( 2000) used functional magnetic resonance imaging
(fMRI) to study the effects of the presentation
duration of natural scenes in a backward masking paradigm. They compared brain
activations in different visual areas with psychophysical recognition rates. In
this study, a reduction of the scene-mask SOA to 120 ms had only a weak effect
on the activity in primary visual cortex (V1) and in the lateral occipital
complex (LOC), but shorter presentation durations reduced the activity in LOC
more strongly than in V1. At 20 ms the activation in LOC approached the
baseline, while in V1 the activation was still 70% of the activation produced by
the unmasked presentation. These results are in accord with the single cell
studies: Masking reduces the activity in higher visual areas but the effects in
earlier visual areas are
weak. Masking as a tool to study the dynamics of visual processing
It is known that natural scenes are processed very
quickly and efficiently in the visual system (Allison, Puce, Spencer, &
McCarthy, 1999; Bötzel &
Grüsser, 1989; Thorpe, Fize, &
Marlot, 1996). EEG studies have often
concentrated on the latency of object category specific brain activity. The
results of Thorpe et al. ( 1996) indicate
that an animal in a natural scene can be detected by the human visual system
within 150 ms. Faces elicit specific EEG components only 170–200 ms after
the onset of the face-image on a screen (Allison et al., 1999; Bötzel & Grüsser, 1989). Only 60 ms of processing is sufficient
to encode basic stimulus attributes such as color into short-term memory
(Gegenfurtner & Rieger, 2000).
Unfortunately, most of the EEG studies on scene and object processing
have been performed without masking the stimuli and with an unlimited processing
time, making it difficult to draw conclusions about the dynamics of the
processing of objects or natural scenes. The studies of Kovacs et al. ( 1995), Rolls and Tovee ( 1994), and Grill-Spector et al. ( 2000) show that masking can be used to study the
dynamics of the processing of complex visual stimuli because the mask imposes
constraints on the processing of the target that can be exactly controlled by
the timing of mask.
In the study described here we investigate the pattern
backward masking of natural scenes with human subjects using integrated
electrophysiological and psychophysical measurements. We recorded brain activity
while subjects performed a scene-recognition task ( Figure 1). In our paradigm, an abstract pattern
mask followed a scene at one of several SOAs. We extracted the initial transient
activation from the scene-mask transition and compared it to the brain activity
recorded during the unmasked processing of the scene. We also measured
psychophysical recognition performance at the various SOAs. By comparing these
measures, we were able to evaluate both the dynamics of the processing of the
scene and the dynamics of the target-mask interaction. Our approach allowed us
to circumvent the strong assumption that the brain activity measured on the
surface of the skull is the arithmetic sum of the activations produced by the
target and mask when they are presented separately. From our results, we derived
a simple model for pattern masking.
Figure 1. In the
psychophysical paradigm each trial started with a fixation spot. The target
photograph was presented at one of several durations (stimulus-onset
asynchronies, or SOAs), the time between target and mask onset. There was no gap
between target and mask, and the mask remained on the screen for 500 ms. In
the subsequent query phase the target photograph was presented together with a
distracter photograph. Each photograph was used only once in the experiment.
We measured recognition performance for natural scenes
as a function of presentation duration. Recognition was tested in a
two-alternative forced-choice match-to-sample task ( Figure 1). Each trial started with a fixation
spot displayed at the center of a projection screen for a random duration
between 1 and 1.4 s. Then a digitized photograph of a natural scene (target) was
displayed for a short duration (between 25 and 500 ms). Except for the unmasked
condition, a pattern mask immediately followed the scene and displayed for 500
ms. In the unmasked condition, the scene was presented for 500 ms. Following the
mask presentation, a query screen with two images was presented. One of the
images was the target and the other a distracter. The subject had to decide
which of the two images was the target. The query screen remained visible until
the subjects gave a response by moving a finger on their left or right hand,
which triggered a photoelectric switch. After the response, a blank screen was
displayed for 1 s, and the next trial started automatically with the fixation
point.
To avoid possible confounding learning effects, each
image was shown only once. The images were chosen from a commercially available
database (Corel PhotoDisc). The mask images consisted of randomly placed and
sized color parallelograms. The parallelograms drawn in the foreground were
smaller than the ones drawn on the background, so that a single large
parallelogram could not form a large uniform surface. The colors in the mask
were randomly chosen from the target and the distracter colors. A new mask was
generated on each trial. Pilot experiments indicated that the mask was very
effective. The rooted mean square (RMS) contrast was on average approximately
15% higher than the contrast of the scenes. The recognition threshold was
determined by fitting a cumulative Gaussian function to the
data.
The pictures were projected on a screen by means of a
DLP-projector (model ddv810, Liesegang) with 800 x 600 pixels resolution and a
refresh rate of 72 Hz. The subject was seated 0.9 m away from the screen, and
the pictures (384 x 256 pixel) were 27.5 deg of visual angle wide and 18.8-deg
high. The background of the presentation screen was gray with an average
luminance of 32 cd/m2, which was the average luminance of the images.
The pictures were presented at a 16-bit color depth, and the
voltage-to-intensity function of each of the three projector primaries was
linearized by means of a lookup table.
Presentations were controlled by a personal computer
running MS-DOS. The start of the target and mask presentation was monitored with
photodiodes attached to the upper edge of the screen. A small white square was
presented at this location while the target was on the screen. The signal from
the photodiodes was recorded in parallel with the magnetoencephalogram (MEG) to
provide a precise temporal marker for the appearance of target and mask on the
screen.
Physiological measurements
While the subjects performed the psychophysical task,
event-related magnetic fields (EMFs) were recorded with a 151-channel whole head
MEG system (CTF, model Omega) located in a magnetically shielded chamber. The
DLP-projector was installed outside the shielded chamber, and the picture was
projected via a hole in the wall of the chamber using a mirror system on the
back of a transparent projection screen. The MEG recording started 200 ms before
the target appeared on the screen, and the data were collected during a 700-ms
period. The signal from the target photodiode served as a trigger for computing
the EMF averages. Baseline activity was defined as the average activity during
the 200 ms of pre-trigger recording. The MEG sampling rate was 625 Hz. During
the experiment, the head of the subject was stabilized with a system of posts.
Trials containing signals exceeding 1.5*1012
T, caused by eye movements or muscle contractions, were excluded from the
data analysis. The time series was filtered with a 45-Hz low-pass
FIR-filter.
Eight subjects participated in the experiment. All had
normal or corrected-to-normal visual acuity. Subjects were paid for
participation and gave their informed consent before the start of the
experiment.
The eight subjects were split into two groups of four
subjects. Each group was tested in four stimulus conditions. In the first
condition, only the target was presented for 500 ms without a mask (target-only
condition). In the second condition, only a mask was presented without a
preceding target (mask-only condition). In the third and fourth conditions, the
target was presented for a short duration with a mask following immediately
(target and mask conditions). For group A, the target durations (SOA) in the
target and mask conditions were 37 ms and 92 ms. For group B, the target
durations were 24 ms and 60 ms. Note that every subject was tested in conditions
one (target-only) and two (mask-only). The four stimulus and mask conditions
were distributed across two groups because each condition was repeated 120
times, which would have made the duration of a testing session unacceptably long
if all six conditions had been included.
The 480 measurements for each subject were split into
three blocks of approximately equal length. Over these three blocks, the four
trial types were randomly interleaved. Subjects had a short break between the
blocks but were required to remain in the
MEG.
Figure 2 shows the
recognition rate as a function of presentation duration. For every presentation
duration, data were averaged across all subjects. In the mask-only condition
(labeled as 0-ms SOA), the subjects had no information about a target, and
consequently the recognition rate is close to the 50% chance level (46.8%,
averaged over all eight subjects). When the target is presented without a
subsequent mask (labeled as 500-ms SOA), the recognition rate is close to
perfect (98.1%, averaged over all eight subjects). The recognition rate is
significantly reduced when the SOA is 60 ms or shorter,
t(3) = 8.8,
p<.005, and only 24-ms SOA was
sufficient to obtain a recognition rate that is significantly above guessing
performance, t(3) = 3.15,
p < .05. To achieve the threshold
level recognition performance (75% correct), a SOA of 44 ms was necessary.
Figure 2. The psychophysical recognition
performance. The performances for mask-only and scene-only presentations are
plotted at 0-ms SOA and 500-ms SOA, respectively. The first significant
reduction in recognition performance occurs when the SOA is reduced to 60 ms,
and the 75% correct response criterion threshold is reached with a 44-ms SOA.
There is still significantly better than guessing performance with a SOA of only
24 ms. Error bars indicate 95% confidence intervals.
Physiological measurements
The goal in our analysis was to track a temporal marker
that signals the transition from scene to mask in the combined presentation of
scene and mask. In the top row of Figure 3 we plotted the raw data of the magnetic fields evoked by the onset of the mask alone for the first two subjects measured in both groups (EMFs). The time series from all channels are overlaid. The data of both subjects show a first small maximum approximately 75-ms after-mask onset and a second higher one after approximately 100 ms (gray highlighted). The first maximum is most likely the magnetic equivalent of the CI that reflects the initial activation in retinotopic striate cortex (Martinez, DiRusso, Allno-Vento, Sereno, Buxton, et al., 2001). The maximum
of the second deflection is the magnetic equivalent of the P100 that includes
activity from retinotopic extrastriate visual areas (Martinez et al., 2001).
Figure 3. Top row: Single subject averaged
event-related magnetic field (EMF) time courses in all 151 sensors elicited by
the presentation of the mask alone. Data from the first subject measured in each
group are shown. The data show an initial small deflection that peaks after 75
ms and a second higher deflection that peaks after 100 ms (gray highlighted).
Then an intermediate minimum occurs after this second deflection. The maxima of
the deflections are restricted to a subset of sensor
traces. Second and third rows: The EMF
difference between masked and unmasked scene presentations [(scene and mask)
– scene]. This difference should reveal higher activation elicited by the
transition from the scene to the mask. The difference waves are initially quite
flat but show a deflection after a certain latency (gray highlighted). The
difference waves for the short and long SOAs are plotted in the second and third
row, respectively. The SOAs are given in the legend of each plot. Longer SOAs
prolong the latency of this deflection. This indicates that the initial
deflection in the EMF difference waves reflects activation that is elicited by
the scene-mask transition. Again the effects appear to be restricted to a subset
of sensors. The latencies of the scene-mask transition seem to be preserved in
the difference waves but the form of the difference waves shows only limited
similarity with the EMFs produced by the mask alone. The amplitudes of the
difference waves are clearly reduced. This indicates that it is unlikely that
simple subtraction approaches would correctly reveal residual scene activation
after masking or that summation approaches would produce a meaningful prediction
of the brain activity.
In the second and third row of Figure 3 we show the EMF difference waves
between the masked and the unmasked scene presentations for the same subjects at
the different SOAs. The visual stimulation in both conditions is initially the
same until the display switches from the scene to the mask. Thus, it can be
expected that the EMF difference waves fluctuate randomly until the transition
from the scene to the mask leads to a change in brain activity. As can be seen,
the difference curves are relatively flat up to a certain latency at which a
sudden increase of the differential activity occurs (gray highlighted). The
latency of this effect increases with increasing scene-mask SOA. The maxima of
the initial EMF deflections elicited by the mask alone ( Figure 3, top row) and in the difference plots
( Figure 3, second and third rows) appear
only in a subset of the channel traces.
The spatial distribution of the magnetic fields ( Figure 4) shows that the 100-ms EMF maximum
elicited by the mask was concentrated over the occipital sensors in each
subject, and we suspected that this maximum is a good candidate to determine the
dynamics of brain activity caused by the scene-mask transition at different
SOAs. For the subsequent analysis we used a subset of 29 occipital sensors that
covered the location of the spatial activation peaks ( Figure 4) and used the same group of sensors for
all subjects in the data analysis. The reduction of the sensor set focuses the
analysis to the location of occipital cortex where the earliest signals from the
scene-mask transition can be expected, and it increases the signal-to-noise
ratio.
Figure 4. The
distribution of magnetic fields over the heads of all subjects 105 ms after the
onset of the mask when the occipital activation peaked. Blue indicates magnetic
field vectors emanating from the skull, and in the red areas the vectors point
toward the skull. Yellow and bright blue indicate a stronger magnetic flux. The
white dots represent the sensor locations. The marked sensors (thick white dots)
were selected for the analysis.
We rectified the data in the sensor subset and then
averaged over sensors and subjects in both groups (e.g., Fylan, Holliday, Singh,
Anderson, & Harding, 1997). This gives a
measure of occipital brain activity over time. The rectification is required
because the magnetic fields elicited by an active part of the brain are bipolar.
Simple averaging of the negative and positive portions of the bipolar fields
does not lead to meaningful results, or might even lead to a cancellation of the
magnetic fields ( Figure 4).
The results of this analysis are plotted in Figure 5A and 5C for the mask-only and scene-only conditions for group 1 and 2, respectively. Both scene and mask evoke a strong activation that initially rises steeply and peaks around 100–110
ms after the images appeared on the screen. In group 2 an earlier but weaker
intermediate activation peak is visible approximately 75 ms after the onset of
the stimuli. This lower activation peak reflects the magnetic equivalent of the
CI, the earliest measurable cortical activation in the EEG that is most likely
generated in V1 (Martinez et al., 2001)
and was already visible in the raw data ( Figure
3, top row). In the data from group 1, a slight slope change after
approximately 75 ms indicates the presence of the same activation. The latency of the higher 100-ms peak is
somewhat shorter for the mask (group 1: 99.2 ms; group 2: 105.6 ms; mean: 102.4
ms) than for the scene (group 1: 107.2 ms; group 2: 112 ms; mean: 109.6 ms). The
difference in latency between the two groups of subjects is quite small (mask:
6.4 ms; scene: 4.8 ms), indicating that these latencies are reproducible between
groups. The 100-ms peak elicited by the mask (2*10 –13 T) is
higher than that elicited by the scene (1.54*10 –13 T). This
difference is statistically significant according to the criterion of Rugg,
Doyle, and Wells ( 1995). This criterion seeks
to avoid spurious differences by counting only differences that include more
than five consecutive samples with difference
p values less than .01. It is clear
that the initial occipital deflection that extends between approximately 70 ms
(when the CI occurs) and 170 ms (when an intermediate minimum occurs) and peaks
at ca. 100 ms has multiple underlying neuronal generators ( Figure 3, top row). The width of the 100-ms peak
elicited by the mask is 17.6 ms at 90% of the peak amplitude in the average over
all subjects, and the width of the peak elicited by the scene is 25 ms. Our goal
in the following analysis is to track this peak of the occipital activation from
the mask as a temporal marker that signals the transition between scene and mask
in the combined presentation. To extract the activation peak caused by the
scene-mask transition in the combined presentation of scene and mask, we
subtracted the activation produced by the scene alone in our sensor set from the
activation patterns that occurred in the combined presentations. The peak caused
by the scene-mask transition in this difference was used as a temporal marker
for the beginning of the interaction between scene and mask. The latency of this
temporal marker can be seen as an upper limit for the arrival of mask activity
in early visual cortex.
Figure 5. A and
C: The time course of the signal strength in the selected posterior sensors in
group 1 (A) and 2 (C) averaged over all subjects. Both the target scene and mask
produce an initial activation peak approximately 100 ms after stimulus onset.
The initial peak from the mask is higher than from the scene, indicating a
stronger neural activation by the mask. The early 75-ms peak in the raw data ( Figure 3, top row) is visible as a lower peak in
C, and as a slope change in A. B and D:
The time courses of the signal strength for the scene alone and the combined
scene-mask presentations at different SOAs for group 1 (B) and 2 (D). Signal
time courses are shown for scene-only (red), scene-mask SOAs of 37 ms (B, blue),
92 ms (B, green), 24 ms (D, blue), and 60 ms (D, green). The dash-dotted
difference curves were calculated by subtracting the “scene-only”
activation from the activity elicited by the combined scene-mask presentation
(e.g., blue curve – red curve). The positive difference peaks (marked by
the arrows) are clearly recognizable (e.g., at 133 ms in the 37-ms SOA
difference curve and at 192 ms in the 92-ms SOA difference curve). However, by
looking, for example, at the 92-ms SOA time course (B, green) and the mask-only
time course (A, black), it is clear that the activation strength measured in the
combined scene-mask presentation is not simply the sum of the activations
obtained in the separate presentations.
The time courses of occipital brain activation elicited
by the combined presentation of scene and mask for subject group 1 and 2 (see
subjects section above) are shown in Figure5B
and 5D, respectively. The activation caused
by the scene alone is shown by the red curve in this figure. The dotted lines in
the figure represent the difference curves. In these difference curves, the
initial activation peak from the mask was selected according to two criteria:
Because the 100-ms activation from the mask alone was the highest activation in
the examined 300-ms time interval, the peak should be the highest and positive.
This time interval was chosen because the longest SOA between target and mask
was 92 ms, and the latency of the peak activity from the mask peak alone was
around 100 ms. Both criteria were defined a priori by looking at the brain
activation elicited by target and mask separately. Note that we do not assume
that the activation from target and mask combines in a strictly additive way.
The analysis assumes only that the transition from the scene to the mask
produces a higher activation in the combined presentation.
We used a correlation analysis to test the assumption
that the difference maxima we extracted at the various SOAs reflect the initial
activation peak due to the mask. The latency of the difference peak can be
predicted by adding the latency of the initial activation peak in the mask-alone
presentation and the SOA. For example, the initial activation peak caused by the
mask alone occurred at 102.4 ms. With a target-mask SOA of 37 ms, the
difference peak should occur at 139.4 ms. In Figure 6 we plot the latency of the initial
difference maximum as a function of the SOAs we used. The correlation
coefficient is r = 0.989;
t(2) = 9.5,
p<.01. The predicted slope of the
regression is 1; the experimentally obtained slope is 1.04. This indicates that
the difference maxima we extracted do in fact reflect the initial activation
peak caused by the scene-mask transition.
Figure 6. The
measured latency of the difference peaks plotted against the respective SOA
(black dots). The dash-dotted line represents the prediction for the latency of
the difference peak (see text for an explanation). The prediction was calculated
as the sum of the latency of the mask-only peak (see Figure 5A and C) and the respective SOA.
The next question we asked was what part of the brain
activity associated with the processing of the target could be affected by the
mask. The gray square in Figure 6 marks the
temporal range in which additional activation from the mask was associated with
a significant reduction in psychophysical recognition performance. It can be
seen that the reduction in recognition performance occurs when the initial
activation from the mask overlaps the processing of the target within the first
170 ms after the onset of the target on the screen.
In Figure 7 the
information from the physiological recordings and psychophysical measurements is
merged to explore the dynamics of the interaction between the pattern mask and
scene. The effects of the mask activation within the time course of target
processing become clearer when recognition performance is plotted as a function
of the latency of the mask-activation peak and overlaid on the activation in the
undistorted scene processing. Accordingly, in Figure 7 the height of the bars indicates
recognition performance at the target-mask SOAs shown on the upper abscissa, and
the latency of the activation peak produced by these masks is shown on the lower
abscissa (e.g., 160-ms latency at a 60-ms SOA; also see Figure 5). The red curve in this graph is
plotted against the same timescale and shows the average time course of the
occipital activation produced by the unmasked natural scenes. The first activity
peak produced by the processing of the scene begins at approximately 70 ms when
the activation rises steeply, crests at 110 ms, and then descends to a
local minimum at 172 ms. With mask SOAs of 24–60 ms, the mask activity
peak overlaps with this scene activity peak. This interval coincides with the
interval in which there is a significant reduction in recognition performance,
with the shorter SOAs producing both the most overlap and greatest reduction in
recognition performance. Interestingly, at a 92-ms target-mask SOA, where we
found no reduction in recognition performance, the mask-generated peak had a
latency of 190 ms and did not overlap with the initial scene-activation
deflection. Our data suggest that psychophysical recognition performance is
reduced when the first activation peak produced by the mask overlaps with the
first activation deflection generated by the scene. The greater this overlap,
the more recognition performance is reduced. Activity related to the processing
of the scene that occurs later than approximately 170 ms after the scene onset
does not seem to be vulnerable to the initial activation elicited by the pattern
mask in our recognition paradigm.
Figure 7. The
overlap between the neural activity produced by the scene and mask. The red
curve represents the activation elicited by the processing of the scene alone
averaged over all subjects (plotted against the red axes). Note the conspicuous
deflection between approximately 70 ms and 170 ms that peaks at 110 ms. The
height of the bars indicates the recognition performance (the black right
ordinate) at the different mask SOAs (the upper black abscissa). The position of
the bars on the lower red abscissa shows the latency of the activation peak in
the difference wave for each of these SOAs. There is only a reduction in
recognition performance when the mask activations overlap the scene-activation
deflection. The error bars indicate the standard error of the mean.
Scene processing dynamics and scene-pattern mask interaction
We investigated the dynamics of the processing of
natural scenes by disturbing the processing of the scene after various temporal
intervals with a pattern mask. The comparison of psychophysically determined
recognition rates and the brain activity measured in the MEG allowed us to
specify an interval in which the arrival of new information in visual cortex
(e.g., by a pattern mask) disturbs the cortical processing of a scene ( Figure 6). In our scene-recognition task this
period begins 70 ms after the onset of the scene, when the first cortical visual
activation can be observed in early visual cortex, and lasts up to 170 ms. After
170 ms, activation peaks produced by the scene-mask transition have no
detrimental effect on scene-recognition performance. The initial MEG deflections
produced by the target and mask certainly reflect neuronal activity in multiple
visual areas. The activation was located over occipital sensors and most likely
includes the CI (N75) and the P100 components. The early part of this
deflection, including the peak around 100 ms, is thought to be generated in
retinotopic visual areas early in the processing hierarchy. This has been shown
in MEG and with event-related potential studies using intracranial recordings
(Arroyo, Lesser, Poon, Webber, & Gordon, 1997) and source localizations (Martinez et
al., 1999; Martinez et al., 2001; Rowley & Roberts, 1995; Shigeto, Tobimatsu, Yamamoto, Kobayashi,
& Kato, 1998). We assume that the
mask-induced activity peak we tracked in the difference waves reflects an upper
limit for the time lag of the interaction between target and mask in early
retinotopic visual areas. We make this assumption because the mask-induced
deflection underlying that peak is extended in time, and it is possible that
earlier components of the mask-induced activation [e.g., the CI (N75)] could
interact with the processing of the target.
Other arguments also speak for a scene-pattern mask
interaction at early processing stages: The pattern mask has to be presented at
the same spatial location as the scene photograph to be effective, and the
low-level features (colors and orientation) of our mask and the scene are very
similar (i.e., they cannot be coded in different neuronal populations in
retinotopic areas).
Generally, little is known about the nature of the
information that can be extracted from a natural scene during these short
processing intervals. Like others (e.g., Thorpe, 1996; Grill-Spector et al., 2000) we did not put special emphasis on this
issue. Nevertheless, it appears highly likely to us that relatively high-level
information was extracted in trials with successful recognition. Studies of
object-specific processing indicate that during the first 150 ms of the
cortical processing of an object the brain activity is driven by basic stimulus
features, while activity after 170 ms is sensitive to object categories or task
specific (Allison et al., 1999; Bötzel
& Grüsser, 1989; Johnson &
Olshausen, 2003; VanRullen & Thorpe, 2001). These reports are commensurate with
the results from our correlation analysis, which suggests that only processes
earlier than 170 ms are vulnerable to the activation generated by a pattern
mask in our scene-recognition task. According to this interpretation, visual
processing would reach a state after approximately 170 ms where it becomes
independent of a low-level representation of the scene in early retinotopic
visual areas. After this interval the visual system may have developed abstract
representations of at least parts of the scene that are immune to interference
by a simply structured mask. The neural substrate for the integration of
information from earlier processing stages during the first 170 ms could be
object-specific areas, such as area LOC. The BOLD-fMRI
activity in area LOC correlates with the psychophysical scene-recognition
performance (Grill-Spector et al., 2000).
It seems unlikely to us that subjects used simple color
information encoded during the scene presentation to distinguish between the
scenes in the query phase at the critical short SOAs. Gegenfurtner and Rieger
( 2000) investigated the dynamics of
the contribution of color to scene recognition with the identical paradigm.
Their results indicate that, for example, color in natural scenes initially
(during the first 30 ms) supports the image coding process, probably by easing
the segmentation process. Longer presentation durations (longer than 60 ms) led
to an advantage in retrieval, presumably by enhancing the representation in
memory. In the actual study, we tried to force subjects to use as much
information as possible by presenting a wide variety of scene types (e.g.,
landscapes, animals, flowers, indoor scenes, cars, food, people, etc.) in the
query phase. This should keep subjects from developing strategies that are
specific to the discrimination between a narrow range of scene classes (e.g.,
the mean scene color).
Recent electrophysiological studies on the dynamics of
visual processing of natural scenes (e.g., Johnson & Olshausen, 2003; Thorpe et al., 1996; VanRullen & Thorpe, 2001) used unmasked presentations to study
the dynamics of scene processing and concentrated on the latency of brain
activity that is specific for an object class. Thorpe et al. ( 1996) attribute a deflection that begins to
differ from baseline 150 ms after the scene onset to object-specific processing
in natural scenes.
In these studies subjects had in principle extended
processing times because visual persistence lengthens the availability of a
visual stimulus far beyond the physical presentation duration (Sperling, 1960) when no mask follows the stimulus. This
is indicated by the fact that the subjects could classify the scenes perfectly,
although they were presented for only 20–40 ms. This makes it difficult to
obtain information about the dynamics of the information extraction process. In
the present experiment we interfered with the processing of the scene at
different latencies with a pattern mask. A comparison of our psychophysical and
physiological data indicated that only 20 ms of undistorted processing is
sufficient to process enough information to discriminate two scenes better than
chance. However, to reach maximum performance in our scene-recognition paradigm,
60 to about 90 ms of undistorted cortical scene processing was necessary. This
parallels results from single cell recordings in macaque IT cortex and STS.
Cells in these regions also show some shape-selective responses at target-mask
SOAs as short as 20 ms (Kovacs et al., 1995;
Rolls & Tovee, 1994). But the shape
discrimination improves in single macaque IT cells only during the first
80 ms of their response (Kovacs et al., 1995). Rolls and Tovee ( 1999) found a similar interval in a
face-discrimination task when they analyzed the information available in the
spike trains of macaque IT cells at different target-mask SOAs in STS cells. The
information available did not differ much between 100- and 60-ms integration
duration, but dropped rapidly when the target-mask SOA was reduced below 60
ms.
We cannot distinguish from our MEG data whether the
analysis of the scene in the visual system is processed in a pure feed-forward
manner (Thorpe et al., 1996) or if feedback
loops play a role (Bullier, 2001; Enns
& Di Lollo, 2000). It seems likely that
during the first few tens of milliseconds information is extracted dominantly
via feed-forward pathways. Therefore, the significant recognition performance
obtained at our shortest SOA (24 ms) might be based largely on feed-forward
processing. On the other hand, the processing duration required to reach full
recognition performance is sufficiently long to allow feedback connections to
play a role (Bullier, 2001).
Earlier EEG studies on the neural basis of visual
masking in humans have led to diverging conclusions about the neuronal basis of
masking (Donchin & Lindsley 1965; Enns
& Di Lollo, 2000; Kaitz et al., 1985; Schwartz & Pritchard, 1981; Vaughan & Silverstein, 1968). Some of the discrepancies may be caused
by the use of different types of masking, which may have different underlying
neuronal mechanisms (Breitmeyer, 1984;
Enns & Di Lollo, 2000). In addition, the
approach to determine the residual target processing after masking might be
problematic. This was done by subtracting the brain activity measured during
mask-only presentations from the brain activity measured during combined
presentations of target and mask (e.g., Donchin & Lindsley 1965; Kaitz et al., 1985; Schwartz & Pritchard, 1981). Inherent to this approach is the
strong assumption that the activity measured at the scalp in the combined
presentation of target and mask is simply the sum of the activity obtained when
both stimuli are presented separately. It can be seen in Figure 3, Figure
5B, and 5D that this assumption may not be justified. Even the initial deflection in the difference waves has much lower amplitude than what would have been expected from the presentation of the mask alone. This suggests that the difference waves that are thought to reflect the residual activation from the target after masking can contain artificially created deflections due to the overestimation of the mask activation. In addition, the shape of the difference waves has only limited similarity with the waveform elicited by the isolated presentation of the mask. The interpretation of the difference waves is especially problematic if it is not known which proportion in the difference wave reflects target processing and when mask processing begins. We tried to circumvent this problem by making a weaker and falsifiable assumption. In our approach we tried to extract an early response to the pattern mask and tracked it over various target-mask onset asynchronies. Our data show that this is possible even for very short SOAs. This early mask response defines an upper limit for the time when the pattern mask begins to interfere with the processing of the target, and its latency can be compared to psychophysical performance and the undistorted target processing.
It seems unlikely to us that a large amount of pattern
backward masking occurs at precortical processing stages. The integration of
target and pattern mask seems to occur predominantly at the cortical level as
shown by, for example, Turvey ( 1973) in
experiments using dichoptic masking. Presenting the target and the pattern mask
to the two eyes separately leads to masking. This is different from masking by
light, where both stimuli have to be presented to the same
eye.
Our results agree with a simple, physiologically
motivated model for pattern backward masking of natural scenes. This model
presumes the major processing stages that are contained in many other models of
visual processing (Kosslyn, 1999; Lennie,
1998; Marr, 1982; Sperling, 1963). There is a representation of the scene
in a visual buffer, whether it is called raw primal sketch (Marr, 1982) or iconic memory (Sperling, 1963), that encodes the physical properties
of the stimulus. This representation is continuously analyzed by higher level
processing stages and transformed into more and more abstract entities, such as
surfaces or objects that are transferred into memory. In this model, the pattern
mask would exert its detrimental effect by overwriting the template of the
stimulus in the sensory buffer and replacing it with a representation of the
mask. Similar physiological mechanisms were suggested by Bullier ( 2001) and Lamme and Roelfsema ( 2000). The overwriting of the sensory buffer is
possible because the mask produces a much stronger initial activation peak than
the target ( Figure 5A and 5C) that is presumably due to the higher RMS
contrast of the mask. Mask contrast has been shown to be an important
determinant of the strength of a pattern mask (Turvey, 1973). Both
masking by integration (Enns & Di Lollo, 2000; Turvey, 1973) and masking by competition (Keysers &
Perrett, 2002) are compatible with this
view. The sooner and more severe the disruption of the analysis of the template,
the less information is extracted and the greater the reduction in recognition
rates (Turvey, 1973). The visual buffer may
be implemented in the retinotopic early visual areas (Bullier, 2001; Lamme & Roelfsema, 2000). This
view is supported by studies in which the authors looked at the effects of
masking at higher level visual-processing stages. A pattern mask decreased the
neuronal responses to the target object and the information available in the
spike train in macaque IT and STS when the SOA between target and pattern mask
was reduced (Kovacs et al., 1995; Rolls
& Tovee, 1994; Rolls et al., 1999). Keysers and Perrett ( 2002) and Keysers, Xiao, Foldiak, and Perrett
( 2001) found the same effect in STS neurons
when the change rate in a rapid-serial-visual-presentation paradigm increased.
In humans Grill-Spector et al. ( 2000) found a
reduction of the BOLD-fMRI-activation in shape and object-specific brain areas,
such as LOC and the fusiform face area, when the target-to-pattern mask SOAs
were sufficiently short. The reduction in activation correlated with the
decrease in recognition rate at short
SOAs.
In summary, we have shown that is possible to identify
reliably in MEG time series data an initial activation peak evoked by the
pattern mask in a combined presentation of a target and mask. By integrating
psychophysical and physiological data, we found that the scene processing during
the initial 170 ms after the scene onset seems to be vulnerable to new
information from a pattern backward mask. Brain processes initiated later are
independent of new information arriving in the early visual areas, at least in a
recognition task like ours with the simple pattern mask. Only 20 ms of
undistorted cortical processing of the scene was sufficient to discriminate a
scene from a distracter at a rate that was better than chance, but between 60 ms
and 90 ms were necessary to reach the maximal recognition performance. Our
results are consistent with the assumption that the pattern mask overwrites a
visual buffer that is presumably located in the early, retinotopic visual areas,
and thereby disturbs the transfer of information to subsequent processing
stages.
We want to thank Robert Fendrich for helpful comments
on the manuscript. JWR was supported by Land Sachsen-Anhalt Grant FKC0017IF0000
to the Magdeburg Leibniz program and Bundesmisterium für Bildung und
Forschung Grant 01GO0202. Commercial
relationships: none.
Corresponding author: Jochem Rieger.
Email:
jochem.rieger@nat.uni-magdeburg.de.
Address: Leipzigerstr. 44, 39120 Magdeburg,
Germany.
Allison, T., Puce, A.,
Spencer, D. D., & McCarthy, G. (1999). Electrophysiological studies of human
face perception. I. Potentials generated in occipitotemporal cortex by face and
non-face stimuli. Cerebral Cortex,
9(5), 415-430. [ PubMed]
Arroyo, S., Lesser, R. P.,
Poon, W.-T., Webber, R. W. S., & Gordon, B. (1997). Neuronal generators of
visual evoked potentials in humans: Visual processing in the human cortex.
Epilepsia, 38, 600-610. [ PubMed]
Bachmann, T. (1994).
Psychophysiology of visual masking. New
York: Nova Science Publishers.
Baxt, N. (1871). Über die
Zeit, welche nöthig ist , damit ein Gesichtsausdruck zum Bewußtsein
kommt und über die Größe (Extension) der bewußten
Wahrnehmung bei einem Gesichtseindrucke von gegebener Dauer.
Archiv für die gesamte Physiologie,
4, 325-336. [ PubMed]
Bötzel, K., &
Grüsser, O. -J. (1989). Electric brain potentials evoked by pictures of
faces and no-faces: Search for 'face-specific' EEG-potentials.
Experimental Brain Research, 77,
349-360. [ PubMed]
Breitmeyer, B. G. (1984).
Visual masking: An integrative approach
(Vol. 4). Oxford: Oxford University Press.
Breitmeyer, B. G., &
Ogmen, H. (2000). Recent models and findings in backward visual masking: A
comparison, review, and update, Perception
& Psychophysics, 62, 1572-1595. [ PubMed]
Bullier, J. (2001). Integrated
model of visual processing. Brain Research
Reviews, 36, 96-107. [ PubMed]
Donchin, E., Wicke, J. D.,
& Lindsley, D. B. (1963). Cortical evoked potentials and perception of
paired flashes. Science, 141,
1285-1286. [ PubMed]
Donchin, E., & Lindsley,
D. B. (1965). Visually evoked response correlates of perceptual masking and
enhancement. Electroencephalography and
Clinical Neurophysiology, 19, 325-335. [ PubMed]
Enns, J. T., & Di Lollo, V.
(2000). What's new in visual masking? Trends
in Cognitive Sciences, 4(9), 345-352. [ PubMed]
Fylan, F., Holliday, I. E.,
Singh, K. D., Anderson, S. J., & Harding, G. F. A. (1997).
Magnetoencephalographic investigation of human cortical area V1 using colour
stimuli. Neuroimage 6, 47-57. [ PubMed]
Gegenfurtner, K. R.,
& Rieger, J. (2000). Sensory and cognitive contributions of color to the
recognition of natural scenes. Current
Biology, 10, 805-808. [ PubMed]
Grill-Spector, K., Kushnir, T.,
Hendler, T., & Malach, R. (2000). The dynamics of object-selective
activation correlate with recognition performance in humans.
Nature Neuroscience, 3(8), 837-843. [ PubMed]
Johnson, J. S., &
Olshausen, B. A. (2003). Timecourse of neural signals of object recognition.
Journal of Vision, 3(7), 499-512.
http://journalofvision.org/3/7/4/, doi:10.1167/3.7.4. [ PubMed][ Article]
Kahnemann, D. (1968).
Methods, findings and theory in studies of perceptual masking.
Psychological Bulletin, 6, 404-425. [ PubMed]
Kaitz, M., Monitz, J., &
Nesher, R. (1985). Electrophysiological correlates of visual masking.
International Journal of Neuroscience,
28(3-4), 261-268. [ PubMed]
Keysers,
C., Xiao, D. K., Foldiak, P., & Perrett, D. I. (2001). The speed of sight.
Journal of Cognitive Neuroscience,
13(1), 90-101. [ PubMed]
Keysers, C., & Perrett, D. I. (2002). Visual
masking and RSVP reveal neural competition.
Trends in Cognitive Sciences, 6(3),
120-125. [ PubMed]
Kosslyn, S. M. (1999). If
neuroimaging is the answer, what is the question.
Proceedings of the Royal Society of London B,
354, 1283-1294. [ PubMed]
Kovacs, G., Vogels, R., &
Orban, G. A. (1995). Cortical correlate of backward masking.
Proceedings of the National Academy of
Sciences U.S.A., 92, 5587-5591. [ PubMed][ Abstract]
Lamme, V. A. F., &
Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and
recurrent processing. Trends in Neurosciences,
23, 571-579. [ PubMed]
Lennie, P. (1998). Single units
and cortical organization. Perception,
27, 889-936. [ PubMed]
Macknik, S. L., &
Livingstone, M. S. (1998). Neuronal correlates of visibility and invisibility in
the primate visual system. Nature
Neuroscience, 1(2), 144-149. [ PubMed]
Marr, D. (1982).
Vision. New York: W.H. Freeman.
Martinez, A., Anllo-Vento,
L., Sereno, M. I., Frank, L. R., Buxton, R. B., Dubowitz, D. J., et al. (1999).
Involvement of striate and extrastriate visual cortical areas in spatial
attention. Nature Neuroscience, 2,
364-369. [ PubMed]
Martinez, A., DiRusso, F.,
Allno-Vento, L., Sereno, M. I., Buxton, R. B., & Hillyard, S. A. (2001).
Putting spatial attention on the map: Timing and localization of stimulus
selection processes in striate and extrastriate visual areas.
Vision Research, 41, 1437-1457. [ PubMed]
Rolls, E. T., & Tovee, M. J.
(1994). Processing speed in the cerebral-cortex and the neurophysiology of
visual masking. Proceedings of the Royal
Society of London B, 257(1348), 9-15. [ PubMed]
Rolls, E. T., Tovee, M. J.,
& Panzeri, S. (1999). The neurophysiology of backward visual masking:
Information analysis. Journal of Cognitive
Neuroscience, 11(3), 300-311. [ PubMed]
Rowley, H. A., & Roberts,
T. P. (1995). Functional localization by magnetoencephalography.
Neuroimaging Clinics of North America,
5(4), 695-710. [ PubMed]
Rugg, M. D., Doyle, M. C., &
Wells, T. (1995). Word and non-word repetition within and across modality: An
event-related potential study. Journal of
Cognitive Neuroscience, 7, 209-227.
Schiller, P. H., &
Chorover, S. L. (1966). Metacontrast: Its relation to evoked potentials.
Science, 153, 1398-1399. [ PubMed]
Schwartz, M., &
Pritchard, W. S. (1981). AERs and detection in tasks yielding U-shaped backward
masking functions. Psychophysiology,
18, 678-685. [ PubMed]
Shigeto, H., Tobimatsu, S.,
Yamamoto, T., Kobayashi, T., & Kato, M. (1998). Visual evoked cortical
magnetic responses to checkerboard pattern reversal stimulation: a study on the
neural generators of N75, P100 and N145.
Journal of Neurological Sciences,
156(2), 186-194. [ PubMed]
Sperling, G. (1960). The
information available in brief visual presentations.
Psychological Monographs, 74,
1-29.
Sperling, G. (1963). A model
for visual memory tasks. Human Factors,
5, 19-31. [ PubMed]
Thorpe, S., Fize, D., &
Marlot, S. (1996). The speed of processing in the human visual system.
Nature, 381, 520-522. [ PubMed]
Turvey, M. T. (1973). On
peripheral and central processes in vision: Inferences from an
information-processing analysis of masking with patterned stimuli.
Psychological Reviews, 80(1), 1-52. [ PubMed]
VanRullen, R., & Thorpe,
S. (2001). The time course of visual processing: From early perception to
decision making. Journal of Cognitive
Neuroscience, 13, 454-461. [ PubMed]
Vaughan, H. G., &
Silverstein, L. (1968). Metacontrast and evoked potentials: A reappraisal.
Science, 160, 207-208. [ PubMed]
|
|