| Volume 3, Number 1, Article 2, Pages 6-21 |
doi:10.1167/3.1.2 |
http://journalofvision.org/3/1/2/ |
ISSN 1534-7362 |
Feature binding in object-file representations of multiple moving items
Jun Saiki |
PRESTO, JST, Kawaguchi, Japan; and
Graduate School of Informatics, Kyoto University, Kyoto, Japan |
|
Abstract
Maintenance of episodic representations by feature-location binding is important for visual cognition. It has been proposed that we can hold and update coherent episodic representations of up to four objects. This study investigated the dynamic maintenance of feature-location bindings with multiple objects. In a series of seven experiments, participants judged whether a sequence of rotating patterns of three or four colored disks contains any color switch between two disks. Color-switch detection is in general difficult, even when tracking of objects’ motions is successful, suggesting that our ability for dynamic maintenance is limited. The performance improved when the interframe rotation angle became sufficiently small. Moreover, spatiotemporal predictability was necessary for this improvement, suggesting that the maintenance of multiple episodic representations is an interactive process between our prediction and sensory mechanisms.
History
Received February 18, 2002; published January 16, 2003
Citation
Saiki, J. (2003). Feature binding in object-file representations of multiple moving items.
Journal of Vision, 3(1):2, 6-21,
http://journalofvision.org/3/1/2/,
doi:10.1167/3.1.2.
Keywords
visual working memory, object file, episodic representation, binding
for related articles by these authors
for papers that cite this paper |
Objects have various perceptual features, such as
color, texture, shape, and size, and they move and change their perceptual
properties across time. Therefore, to perceive and understand visual scenes and
events, we need to make correspondences of feature values to multiple objects,
and keep track of these correspondences as the objects move. However, the
underlying mechanisms of this process are largely unknown.
It is obvious that we cannot store all possible
combinations of these features and their spatiotemporal locations in long-term
memory. Instead, what we appear to be doing is to form episodic representations
of visual scenes and events temporarily, and to use these episodic
representations in various cognitive tasks. There are few experimental studies
on our ability to form, maintain, and transform episodic representations of a
dynamic event with multiple objects. This study systematically investigated the
process of maintenance and transformation of episodic representations.
Episodic Representations in Visual Cognition
It has been proposed that object recognition requires
not only the long-term representation of object categories (often called types),
but also the representation of the object’s presence in a particular
episode (often called tokens; see Kanwisher,
1987). Tokens are short-term episodic representations of objects, and
spatiotemporal information is critical in their individuation ( Kanwisher, 1987). Following this line of
thought, I define episodic representation as mental representation whose
featural information is bound to its spatiotemporal properties. Although the
notion of episodic representation is closely related to the issue of feature
binding, it specifically focuses on the binding of featural and spatiotemporal
information. The binding of featural information with
its spatiotemporal property is important
for various visual cognition tasks. If you need to reach a target object among
many distractors, you need to know the target location in addition to the target
identity. Although one of the important properties of visual object recognition
is locational invariance, locationally invariant recognition itself is usually
insufficient for the action toward the object. In our dynamic world with
multiple objects, formation of episodic representations is an indispensable
ability for successful interaction. There are
some theoretical proposals on episodic representation. Kahneman and Treisman (1984) proposed the
notion of object files. Object files are temporary episodic representations of
real-world objects that are separate from the representations stored in the
long-term recognition network ( Kahneman,
Treisman, & Gibbs, 1992). Each object file contains information about a
particular object in a scene, and is addressed by its location at a particular
time (hereafter I call this spatiotemporal location), not by any feature or
identifying label. An object file collects sensory information, updates it as
the sensory situation changes, and may be discarded when the object disappears
from view. Kahneman et al. (1992) assume
that there is some limit to the number of object files to be stored
concurrently, and some limit in the spatial/temporal gap that can be bridged.
They examined the validity of these assumptions using a reviewing paradigm. In
the reviewing paradigm, simple objects are initially shown with letters inside,
move without letters, and stop with a letter that subjects are then asked to
identify. The object-specific priming effect is considered as evidence for the
object files. It was shown that people can store four object files concurrently,
and bridge an interstimulus interval (ISI) of 590 ms. Pylyshyn (1989) proposed the FINST theory
(also see Pylyshyn, 2001). FINST is a
reference to a particular feature or feature cluster that keeps pointing to the
same feature cluster as the cluster moves ( Pylyshyn, 1989). One important property of
FINST is that it does not encode any properties of the feature in question, but
that it merely makes it possible to locate the feature in order to examine it
further if needed. Thus, FINST can be considered a spatiotemporal index of
feature clusters, which can be constructed preattentively, but there is a
capacity limit to the number of FINSTs to be activated concurrently. Pylyshyn and Storm (1988) suggested that the
capacity limit is four to five using a multiple object-tracking paradigm.
Although these theoretical notions have various
differences, they share important properties as episodic representations of the
visual world. First, they are dynamic in the sense that they are updated as the
world changes. Second, they can deal with multiple objects. Both object files
and FINSTs assume that multiple episodic representations can be maintained
concurrently. Third, they are dealing with the binding of featural and
spatiotemporal information. With regard to this property, object files and
FINSTs have some differences: object files are assumed to contain various
featural information as it is available, whereas FINST simply enables us to
refer to these features at an indexed location. However, both object files and
FINSTs agree in that episodic representation is essential for the binding of
feature and locational information.
Three properties above, dynamic nature, multiplicity,
and feature-location binding, are desirable for various visual cognition tasks.
However, there are few empirical studies investigating our ability to maintain
and transform episodic representations by manipulating these properties. I will
briefly review some of these studies, and then formulate the problems to be
addressed in this work.
Studies on Episodic Representations in Visual Cognition
In terms of the three desirable properties of episodic
representations, multiple objects, dynamic nature, and feature-location binding,
previous studies on episodic representations dealt with only some of these
properties by fixing the others. Using a static but multidimensional display, Luck and Vogel (1997) showed that visual working
memory can hold approximately four multidimensional objects. They used a
change-detection paradigm devised by Phillips
(1974), using multiple objects defined by multiple dimensions, such as
color, orientation, and size. Participants’ change-detection performances
were determined by the number of objects, not by the number of visual features.
Luck and Vogel interpreted this finding as showing that a functional unit of
visual working memory is the representation of perceptual objects where
multidimensional information is integrated (but see Wheeler & Treisman, 2002 and Xu, 2002). This study dealt with a multiplicity of
objects and feature-location binding by manipulating multidimensionality, but
the dynamic nature of representations was not addressed.
Pylyshyn and Storm
(1988) devised a paradigm called multiple object tracking, and showed that
people are able to mentally track four to five moving objects concurrently.
Participants were presented a set of 10 items (cross or dots) randomly placed on
the display, and asked to track some of them. Then the dots slowly moved in
random directions for several seconds, followed by a test to discriminate
whether a probed dot was the one they were tracking or not. Participant
performance was quite accurate when the number of dots to be tracked was under
four or five, suggesting that visual system can hold spatiotemporal information
on four to five objects concurrently. This study investigated the dynamic nature
and multiplicity of episodic representations, but feature-location binding was
not directly examined because objects had identical features; thus,
feature-location binding was unnecessary to perform the task.
As mentioned above, the experimental settings used in
their studies did not explicitly manipulate multiplicity, dynamic nature, and
feature-location binding simultaneously. Therefore, it is unclear whether the
properties found in the previous studies can be applicable to more complex
situations with all three properties varying, or if they are limited to some
restricted situations. For example, Luck and
Vogel’s (1997) finding on the capacity of visual working memory may be
valid only in the static situation. Also, the number of objects concurrently
tracked may be different if the objects are multidimensional. These questions
are important to understand the nature of episodic representations. If Luck and
Vogel’s finding is applicable only to static situations, their theoretical
claim that multiple object representations are formed and stored may be
questioned. Rather, their findings may reflect the role of spatial locations to
bind multiple features.
To answer these questions, we need to set up a
situation that satisfies multiplicity, dynamic nature, and feature-location
binding simultaneously. Using such a situation, I investigated whether the
findings of Luck and Vogel (1997) and Pylyshyn and Storm (1988) could be
generalized. If the previous work reflects the maintenance and transformation of
coherent objects in general situations, then we should expect a similar
performance in a dynamic multidimensional situation. In contrast, if the
previous findings cannot be generalized, or they are not mediated by coherent
object representations, then the performance in the dynamic multidimensional
situation should be substantially impaired.
In the evaluation of participants’ performances
in a dynamic multidimensional situation, it is important to eliminate the
possibility that other factors affected the performance. To ensure this, I used
a very simple stimulus set that eliminated spatial uncertainty and stimulus
complexity. I eliminated spatial uncertainty by using a completely predictable
moving pattern: regular rotation. In the multiple object-tracking paradigm, the
stimuli have a certain amount of spatial uncertainty because of the
unpredictable movement direction of each element. Thus, the difficulty in
multiple object tracking can be attributable either to its dynamic nature or
spatial uncertainty.
To eliminate the possible effect of stimulus
complexity, the number of to-be-remembered objects is equal to the number of
presented objects, as in Luck and Vogel (1997).
In the multiple object-tracking task, the number of presented objects is much
higher than the number of to-be-tracked objects. Although this is suitable for
the purpose of showing that participants’ performances were better than
our expectations, it is a major problem if one tries to show that
participants’ performances are lower than some theories predict, because
the performance impairment can be attributed to the extra stimulus complexity.
Finally, to make the stimuli simple in terms of multidimensionality, I used
minimally multidimensional stimuli. Participants were required to store the
combination of objects’ colors and spatiotemporal locations. One should
note that this minimally multidimensional situation is sufficient to investigate
feature-location bindings.
Finally, and most importantly, we need to make sure
that participants could make motion correspondences of the stimulus sequence,
because the feature-location binding presupposes successful motion
correspondences. It is known that people can successfully track four rotating
objects up to about 360°/s even though motion correspondences are ambiguous
( Verstraten, Cavanagh, and Labianca,
2000), and the stimulus sequence I used here rotated much slower than that.
Still, it is possible that the observed difficulty is due to failure in motion
correspondences. Thus, I used various manipulations to improve motion
correspondences and investigated how much the manipulations improve the task
performance compared with ambiguous conditions.
To examine the maintenance and transformation of
episodic representations of multiple objects, I created an irregularity
detection task (see Figure 1, Movie 1, and Movie
2). Participants were shown sequences of 10 frames depicting a triangular
pattern of three colored disks rotating by a certain angle per frame. The
sequence was either regular clockwise or counterclockwise rotation throughout,
containing one frame in which the locations of two colors were switched
(color-switch), or containing one frame in which a new color was replaced with
an old one (color-replacement). Participants were required to judge whether a
sequence was regular or irregular without identifying its type (color-switch or
color-replacement). Notice that detection of a color-switch needs memory for the
conjunction of each disk’s color and spatiotemporal location, whereas
detection of a color-replacement does not. Thus, the performance for the
color-switch condition is the critical measure of memory for the binding of
color and spatiotemporal location in this paradigm. Note that as shown in Figure 1, when a change occurs, the postchange
frames go back to normal. This is because the single frame irregularity
eliminates participants’ strategies to memorize the color order (most
likely verbally) for the first few frames and compare it with later frames.
Figure 1 . Schematic
illustration of the irregularity detection task. In this example, rotation
direction is clockwise, and irregularity occurs in the second frame.
Movie 1. Demonstration of the irregularity detection task. The movie contains a no-change, a color-switch, and a color-replacement sequence. This is an example of an ambiguous 60-motion condition serving as the baseline.
Movie 2. Demonstration of the stimulus used in Experiment 2. The movie contains a color-switch sequence. This example is a moving 360-ms condition. The actual stimulus was smoother and faster than this example.
At the beginning of the experiment, participants were
given instructions with a diagram similar to Figure 1. The differences among the three
sequence types and response mappings were fully explained using the diagram.
Then participants had a block of six-to-eight practice trials to familiarize
themselves with the procedure. Experimental trials were made up of blocks of
24-to-30 trials, and participants could take a rest between blocks. Throughout
this study, the experimental conditions were randomly mixed from trial to trial.
The main dependent variables were hit rates for the
color-replacement and color-switch trials. Because false alarm rates were
extremely low throughout the study, and usually not different across
experimental conditions, the results of statistical analysis involving false
alarms are not reported. As a supplementary measure, however, d’ related
to color-switch detection was estimated. Although a false alarm can be related
to either a color-switch or a color-replacement detection, d’ is estimated
by using the false alarm rates under the assumption that all false alarms are
related to color-switch detection. Thus, the estimated d’ is somewhat
underestimated, but given the extremely high hit rates for the color-replacement
conditions throughout the series of experiments, the assumption is justifiable.
Part I: Factors Not Contributing to the Maintenance of Multiple Object Representations
A pilot experiment with the equilateral triangle
pattern rotating 60° per frame showed that color-switch detection was
difficult, while color-replacement detection was extremely easy. However, this
particular stimulus setting has an obvious problem: motion perception in this
setting is inherently ambiguous (see Movie 1).
Thus, the difficulty in the color switch-detection may not be due to
color-location binding, but simply to failure in tracking pattern rotation. Part
I used stimuli without such ambiguity in pattern motion, and investigated
whether the difficulty in color-switch detection could be overcome simply by
making pattern motion unambiguous. Experiments 1 and 2 disambiguated the motion
correspondence by using bilateral triangles, and smooth and continuous motion,
respectively. The ambiguous condition was used as a baseline to evaluate the
effect of disambiguation of motion
correspondences.
Experiment 1: Disambiguating the Direction of Motion by Bilateral Triangles
All stimuli were three-colored disks with 1.6°
visual angle diameters. Each disk was placed at a 2.0° visual angle from
the central fixation. Four colors (red, green, blue, and yellow) were used, and
the combinations of displayed colors were counterbalanced across trials. A frame
with a violation of the regular rotation was inserted from the fourth to seventh
frames equally often, and the disks whose colors were switched or replaced were
unpredictable to the participants. The temporal schedule of the stimulus
sequence was 360-ms frame duration and 520-ms stimulus onset asynchrony (SOA).
In other words, each frame was presented for 360 ms followed by a 160-ms blank
period. Participants were asked to fixate the central dot and to try to attend
to the whole pattern throughout a trial. They judged whether a sequence
contained any irregularity without correct feedback. There was no time pressure
to make a response. There were 24 color-switch, 24 no-change, and 12
color-replacement trials for each condition of the main independent variables
described below. The experiment included 180 trials. Participants were eight
Kyoto University graduate and undergraduate students who had normal or
corrected-to-normal vision.
Experiment 1 investigated the effect of pattern
configuration on irregularity detection. There were three pattern configuration
conditions: an equilateral triangle, acute isosceles triangle, and obtuse
isosceles triangle conditions ( Figure 2a).
The stimuli in the equilateral condition were identical to those in the pilot
experiment ( Figure 1), and served as a
baseline. In the acute and obtuse isosceles conditions, the vertical angles of
the triangular pattern were 30° and 90°, respectively. Because of the
pattern configuration, sequences in the acute and obtuse isosceles conditions
were much easier to make correspondences with across
frames. Figure 2 . a. Illustration of conditions in Experiment 1.
b. Mean hit and false alarm rates in Experiment 1.
An alpha level of .05 was used as the criterion for all
statistical tests in this article. The change to the pattern configuration did
not significantly improve the performance. Means hit rates for color-replacement
and color-switch trials and the mean false alarm rate are shown in Figure 2b. A 2 (irregularity type, replacement,
or switch) x 3 (pattern configuration)
analysis of variance (ANOVA) showed a significant main effect of the
irregularity type, F(1,7)=30.375, but
the main effect of pattern configuration and the interaction were not
significant, F(2,14)=3.00 and
F(2,14)=0.25, respectively. Throughout
this study, color-replacement detection was highly accurate, and not different
across experimental conditions. Thus, I will not report the statistical analyses
involving the color-replacement hit rates in greater detail. As for the effect
of pattern configuration on the color-switch hit rate, a single factor ANOVA
showed no significant main effect,
F(2,14)=1.30. Planned comparisons of
the bilateral triangle conditions (acute and obtuse) with the equilateral
condition also showed no significant difference,
F(1,7)=1.06 and
F(1,7)=0.27 for the acute and obtuse
conditions, respectively. Analyses with d’ showed the same pattern of
results. For the remainder of this work, the results with d’ will be
reported only when there is any difference from those with hit rates.
Overall, the difficulty in color-switch detection was
unlikely to be solely due to tracking failure via the homogeneity of the
stimulus configuration. If the use of a bilateral triangle eliminates tracking
failure, the estimated improvement by elimination of tracking failure is around
10%, and the color-switch detection performance is still quite poor.
Experiment 2: Local Motion Signals and Elimination of Abrupt Onsets and Offsets
Experiment 1 used a sequence of static images with
blank periods. Thus, the lack of a local motion signal may make the task
difficult. Clear configurational cues and spatiotemporal predictability alone
may be insufficient to transform episodic representations or to track objects
successfully. Some bottom-up sensory information consistent with this prediction
may be necessary. Local motion signals can qualify as such information. Also,
abrupt onset and offset of patterns may have disrupted the spatiotemporal
integration of episodic representations or tracking of objects. Previous
research suggests that abrupt onset and offset creates and discards an object
representation, respectively ( Yantis &
Hillstrom, 1994; Yantis & Jonides,
1996). Recently, Scholl and Pylyshyn
(1999) showed that multiple object-tracking performance was not impaired
with the occlusion of objects, whereas it was impaired when the objects
disappeared for the same amount of time between the abrupt onset and offset. In
this experiment, I used a smoothly rotating pattern with an occluder, which
makes the pattern visible and occluded alternately ( Figure 3a).
Figure 3 . a. Illustration of conditions in Experiment 2.
b. Mean hit and false alarm rates in Experiment 2.
The triangular pattern was occluded by a gray figure.
There were two independent variables in this experiment: pattern motion (moving
and stationary) and visible duration. In the moving condition, the pattern
smoothly moved with a velocity of 125°/s, by showing each colored pattern
for 40 ms with a 5° interframe angular displacement (see Movie 2). In the attentive tracking literature
with rotational motion stimuli, it is shown that people can track four objects
up to a speed of 360°/s ( Verstraten et
al., 2000). Therefore, the rotation speed used in this experiment appears to
be well within the trackable range. In the stationary condition, the pattern was
stationary at the middle position of the visible phase for the same exposure
duration as the corresponding moving condition. The stationary condition had
ambiguity in object correspondence, and served as a baseline as the equilateral
condition did in Experiment 1. There were two visible durations: the 280-ms
condition had a visible phase of 280 ms and an occluded phase of 200 ms, and the
360-ms condition had a visible phase of 360 ms and an occluded phase of 120 ms.
Visible duration was manipulated by the shape of the occluder. The 360-ms and
280-ms conditions used occluders with openings of 20° and 10°,
respectively (see Figure 3a). There were 144
experimental trials with 36 trials for each condition, composed of 12 color
switches, 12 color replacements, and 12 no-change trials. Experiment 4 was
written in MATLAB using Psychophysics Toolbox extensions ( Brainard, 1997; Pelli,
1997). Participants were 10 Kyoto University undergraduate and graduate
students who had normal or corrected-to-normal vision.
In general, neither pattern motion nor visible duration
had significant effects on color-switch detection. Mean hit rates for
color-replacement and color-switch trials, and the mean false alarm rate are
shown in Figure 3b, and a 2 (pattern motion)
x 2 (visible duration) repeated
measures ANOVA with switch hit rates showed no significant main effects or
interaction, F(1,9)=0.015,
F(1,9)=2.09, and
F(1,9)=0.57, for motion, duration, and
their interaction, respectively. Again, planned comparison of the moving
condition with the baseline stationary condition for each duration condition
revealed no significant difference,
F(1,9)=0.2 and
F(1,9)=0.13 for 280-ms and 360-ms
conditions, respectively.
A local motion signal and the lack of abrupt onset and
offset alone cannot eliminate the difficulty in color-switch detection. We
conclude that local motion and the elimination of abrupt onset and offset did
not have any particular effects on irregularity detection. One should note that
the lack of the effect of abrupt onset is inconsistent with Scholl and Pylyshyn (1999), suggesting that
irregularity detection in this study and multiple object tracking may be tapping
different aspects of episodic representations. It may be, as shown by Scholl and Pylyshyn, that the lack of abrupt
onset and offset improves the trackability of objects, but that the trackability
itself is insufficient for successful color-switch detection.
Color-switch detection in a dynamic multidimensional
situation with multiple objects is in general difficult. Overall, the hit rates
for switch detection did not show any significant improvement in the unambiguous
motion conditions over the ambiguous motion conditions. First of all, the
difficulty in color-switch detection is not due to the problem of perceiving
colors with moving objects, because color-replacement detection is almost
perfect. Also, it was unlikely to be due to the failure in tracking
objects’ rotation, because in Experiments 1 and 2, with apparently
unambiguous rotational motion, the color-switch detection performance showed
only a small insignificant improvement. Disambiguation of objects’ motion
by pattern configuration, smooth and continuous motion, and elimination of
abrupt onset and offset is insufficient for successful color-switch detection.
Although these results suggest that maintenance and transformation of
color-location bindings is difficult, even if the tracking of objects’
locations is successful, one could still argue that the problems observed in
Experiments 1 and 2 are due to tracking failure. Experiment 7 in Part III
addresses this issue and provides more direct evidence against the tracking
failure hypothesis. Before that, Part II reports factors facilitating
color-switch detection.
Part II: Factors Contributing to the Maintenance of Multiple Object Representations
Experiments 3-6 investigated the effect of rotation
angle on the irregularity detection performance. Experiment 3 examined the
effect of the interframe rotation angle. Experiment 4 examined whether the
effect obtained in Experiment 3 was due to the amount of spatial displacement or
the amount of angular displacement by enlarging the distances between the disks
of a pattern. Experiment 5 further examined whether the effect obtained in the
previous experiments was mediated by angular velocity or angular disparity by
manipulating frame duration. Finally, Experiment 6 examined whether the
spatiotemporal predictability of locations is necessary for the facilitatory
effects of reduced rotation angle.
Experiment 3: Effect of Interframe Rotation Angle
The independent variable of Experiment 3 was interframe
rotation angle: 60°, 45°, and 30° conditions ( Figure 4a). Each condition had 24 color
switches, 24 no changes, and 12 color-replacement trials, and the total number
of trials was 180. Participants were seven Nagoya University graduate and
undergraduate students who had normal or corrected-to-normal vision.
Substantial improvement in detection performance was
observed when the interframe rotation angle was reduced. Mean hit rates for
color-replacement and color-switch trials and mean false alarm rate are shown in
Figure 4b, and mean d’s are shown in Table 1. The color-switch hit rates increased
monotonically as the interframe rotation angle decreased. A one-way ANOVA showed
a significant effect of rotation angle,
F(2,12)=9.93. Planned comparisons of
the 45° and 30° conditions with the 60° condition (baseline)
showed significant improvement,
F(1,6)=6.62 and
F(1,6)=16.04, respectively.
Table
1 . Mean d’ values for each condition of Experiments
1-7.
|
Experiment
|
Condition
|
d’
|
|
Experiment 1
|
Acute
|
2.21
|
|
Equilateral *
|
1.76
|
|
Obtuse
|
1.97
|
|
Experiment 2
|
Stationary/240 *
|
1.29
|
|
Stationary/360 *
|
1.46
|
|
Moving/240
|
1.56
|
|
Moving/360
|
1.80
|
|
Experiment 3
|
60° *
|
1.74
|
|
45°
|
2.72
|
|
30°
|
3.10
|
|
Experiment 4
|
60°
/small *
|
1.85
|
|
30°
/small
|
3.09
|
|
30°
/large
|
3.27
|
|
Experiment 5
|
360-ms
|
3.13
|
|
240-ms
|
2.95
|
|
80-ms
|
3.10
|
|
Experiment 6
|
30°
/30°
|
2.08
|
|
30°
/60°
|
2.07
|
|
60°
/30°
|
1.77
|
|
60°
/60° *
|
1.73
|
|
Experiment 7
|
On-target/90°
|
2.74
|
|
On-target/60°
|
2.71
|
|
On-target/30°
|
3.54
|
|
Off-target/90°
|
1.85
|
|
Off-target/60°
|
2.32
|
|
Off-target/30°
|
2.55
|
Conditions with * are inherently ambiguous in
motion correspondences and serve as baseline conditions.
Experiment 4: Rotation Angle or Spatial Displacement?
In Experiment 4, a new condition with a pattern whose
disks were located 4.0° from the central fixation (30°/large
condition) was introduced ( Figure 5a), in
addition to the 30° and 60° conditions with smaller patterns
(30°/small and 60°/small conditions,
respectively). The rotation angle and the
amount of spatial displacement of the 30°/large condition were comparable
to that of the 30°/small condition and that of the 60°/small
condition. Thus, if the rotation angle determines the irregularity detection
performance, the performance in the 30°/large condition should be similar
to that in the 30°/small condition. The number of trials was the same as
Experiment 3. Participants were seven Nagoya University graduate and
undergraduate students who had normal or corrected-to-normal
vision. Figure 4 . a. Illustration of conditions in Experiment 3.
b. Mean hit and false alarm rates in Experiment 3.
Figure 5. a. Illustration of conditions in Experiment 4.
b. Mean hit and false alarm rates in Experiment 4.
The effect of rotation angle is not due to the
reduction of spatial displacement in the smaller rotation-angle conditions
because the enlargement of the triangular pattern did not affect the detection
performance at all. Mean hit rates for color-replacement and color-switch trials
and mean false alarm rate are shown in Figure
5b, and mean d’s are shown in Table
1. The color-switch hit rates were higher both in the 30°/large and
30°/small conditions than in the 60°/small condition. A one-way ANOVA
showed a significant effect of rotation angle,
F(2,12)=16.73. Planned comparisons of
the 30°/large and 30°/small conditions with the 60°/small
condition (baseline) showed a significant improvement,
F(1,6)=22.48 and
F(1,6)=18.42, respectively.
Experiment 5: Rotation Angle or Rotation Velocity?
The interframe rotation angle and duration of blank
period were fixed to 30° and 160 ms, respectively, and the exposure
duration of each frame was varied ( Figure
6a). The 360-ms condition was identical to the 30° condition in
Experiment 3. The 240-ms condition had an angular velocity of 75°/s, which
was close to that for the 45° condition in Experiment 3. The 80-ms
condition had an angular velocity of 125°/s, corresponding to the 60°
condition in Experiment 3. If the angular velocity determines the irregularity
detection performance, the temporal schedule should have a similar effect to
that found in Experiment 3, whereas no significant effect would be expected if
the angular disparity was important. The number of trials was the same as in
Experiment 3. Participants were seven Kyoto University graduate and
undergraduate students who had normal or corrected-to-normal
vision. Figure 6 a. Illustration of conditions in Experiment 5.
b. Mean hit and false alarm rates in Experiment 5.
Experiment 5 shows that the effect cannot be attributed
to the level of angular velocity. Mean hit rates for color-replacement and
color-switch trials and the mean false alarm rate are shown in Figure 6b, and mean d’s are shown in Table 1. The color-switch hit rates for all three
conditions were high, and there was no clear difference among the conditions. A
one-way ANOVA showed no significant effect of exposure duration,
F(2,12)=0.15. Further analysis
comparing data from Experiments 3 and 5 with a 2 (experiment)
x 3 (angular velocity) ANOVA revealed a
significant main effect of angular velocity,
F(2,24)=5.78, and the interaction of
experiment and angular velocity,
F(2,12)=5.33. Inconsistent with the
hypothesis that angular velocity determines color-switch detection, the effect
of angular velocity was significantly smaller in Experiment 5 than in Experiment
3. Even with the same angular velocity, the 60° condition in Experiment 3
had a significantly lower color-switch hit rate than the 80-ms condition in
Experiment 5, F(1,12)=7.56. Within the
range that this experiment manipulated, the irregularity detection performance
did not depend on angular velocity, but rather on the angular disparity between
frames.
Experiment 6: Necessity of Spatiotemporal Predictability
Experiments 3-5 have shown that smaller angular
displacement between frames dramatically facilitates irregularity detection
performance. When the angular disparity was 45° or smaller, the
color-switch hit rates were significantly better than the baseline condition.
Experiment 6 examined whether this facilitatory effect was mediated by
predictability of future locations. Although Experiments 1 and 2 revealed that
up to an interframe rotation angle of 60° complete predictability of future
locations was insufficient to maintain the episodic representations of multiple
objects, locational predictability may be necessary to enable participants to
detect irregularity in smaller angular displacement conditions. In Experiment 6,
interframe angular displacement was varied randomly between 30° and
60° within each trial, so participants could not predict the location of
objects in the next frame. If locational predictability is necessary for
irregularity detection, this manipulation should significantly disrupt
performance. In contrast, if the irregularity detection with 30° rotation
conditions was mediated by local bottom-up processing, performance should not be
impaired.
Figure 7. a. Illustration of conditions in Experiment 6.
b. Mean hit and false alarm rates in Experiment 6.
Unlike previous experiments, interframe rotation angle
within each trial was not fixed in this experiment. Each interframe rotation
angle within a single trial had either 30° or 60° rotation. The order
of rotation angles was random except for the two critical intervals before and
after the irregular event, where the combinations of rotation angles were
defined as independent variables. Thus, the independent variables were rotation
angle in the interval before the irregular event (called “Before”
angle): 60° and 30° conditions and rotation angle in the interval
after the irregular event (called “After” angle): 60° and
30° conditions ( Figure 7a). In no-change
trials, the critical intervals were set so that the temporal positions of these
periods were matched to the color-replacement and color-switch conditions. As in
the previous experiments, irregular events occurred between the fourth and
seventh frames; thus, the earliest critical interval was set at the third
interval (between the third and fourth frames) and the latest critical interval
was set at the seventh interval (between the seventh and eighth frames). In
Experiment 6, the exposure duration of each frame was 160 ms and the blank
period was 360 ms. Each condition had 24 color switches, 24 no changes, and 12
color-replacement trials. Thus, the total number of experimental trials was 240.
Participants were seven Kyoto University undergraduate students who had normal
or corrected-to-normal vision.
Overall, Experiment 6 showed that the locational
predictability was necessary for the successful detection of irregularity. Mean
hit rates for color-replacement and color-switch trials and the mean false alarm
rate are shown in Figure 7b, and mean
d’s are shown in Table 1. The
color-switch hit rates showed no clear differences among the conditions. A 2
(Before angle) x 2 (After angle) ANOVA
showed no significant main effects,
F(1,6)=2.58 and
F(1,6)=0.09, for the Before and After
angles, respectively, and no interaction,
F(1,6)=0.74. Although inspection of Figure 7b suggests that the Before angle has a
weak effect such that 30° rotation tends to be easier than 60°
rotation, the effect was not significant and apparently much smaller than those
in Experiments 3 and 4. To further evaluate the effect of spatiotemporal
predictability, 30°/30° and 60°/60° conditions in this
experiment were compared with the corresponding conditions (30° and
60° conditions) in Experiment 3. A 2 (experiment)
x 2 (rotation angle) ANOVA revealed a
significant main effect of rotation angle,
F(1,12)=24.22, and a significant
interaction of experiment and rotation angle,
F(1,12)=7.93. The significant interaction shows that the effect of rotation angle was significantly reduced in Experiment 6, compared with that in Experiment 3. Planned comparisons by linear contrast tests revealed that the effect of rotation angle was significant for Experiment 3, F(1,12)=29.93, but not
significant for Experiment 6,
F(1,12)=2.22. When the future locations
of objects were uncertain, irregularity detection performance was significantly
impaired even when the angular difference was 30°, suggesting that
locational predictability is a necessary condition for successful color-switch
detection.
One may argue that the change in temporal schedule,
particularly a reduction in the exposure duration, disrupted color-switch
detection. However, it is highly unlikely that this is the sole reason for the
impairment in this experiment, because in previous experiments (Experiments 5)
with predictable rotation, the color-switch detection performance was not
sensitive to the temporal schedule.
Unlike Part I, a reduction of the interframe rotation
angle substantially improved the color-switch detection performance. Within
45° interframe rotation, the hit rate for color switch was significantly
better than the ambiguous baseline condition of 60° interframe rotation.
Experiment 4 with larger patterns showed that this facilitatory effect was due
to interframe rotation angle, not due to interframe spatial displacement.
Experiment 5 with different exposure durations suggested that rotation angle,
not angular velocity, determined the performance. Experiment 6 used a
spatiotemporally unpredictable rotation sequence, and showed that spatiotemporal
predictability is a necessary condition for improvement with smaller rotation
angles.
Part III: Evidence for Color-Switch Detection Difficulty Independent of Tracking Failure
Although I have tried to establish evidence against the
tracking failure account for the difficulty in color-switch detection in
Experiments 1 and 2, one may still argue for the tracking failure account by
assuming that the use of smooth motion and nonequilateral triangular
configurations did not actually help participants correctly track the
objects’ rotation. Thus, we need more direct evidence that participants
had difficulty in color-switch detection even when the tracking of objects was
successful. In Experiment 7, smooth and continuous motion was again used to help
participants’ tracking. A dual task setting was used to obtain
color-switch detection performance conditionalized to the successful tracking of
a cued object. Participants had to judge both the location of the tracked object
and the presence of a color switch for each trial. If the difficulty in the
color-switch detection reflected tracking failure, then color-switch detection
should be much more accurate when the tracking judgment is correct than when it
is incorrect. In contrast, if the difficulty in the color-switch detection was
not due to tracking failure, then color-switch detection should still be
difficult even when the tracking is successful. Another modification in
Experiment 7 was the elimination of the color-replacement condition, because the
presence of color replacement may have led some participants to ignore
color-switch detection.
Experiment 7: A Dual Task of Color-Switch Detection and Tracking
Materials were similar to those in the moving 360-ms
condition of Experiment 2, except for the following changes. First, the number
of objects was four in this experiment, to make the tracking task more
challenging. Second, to replicate the effect of rotation angle in Experiments
3-5, three rotation angle conditions were used. Third, the rotation direction
was randomly varied across trials within each participant, to make the tracking
task more challenging. Fourth, the color-replacement detection condition was
eliminated.
Participants were asked to track a precued target
object and to judge the presence of a color switch simultaneously. A schematic
illustration of the procedure is shown in Figure
8a. At the beginning of each trial, a beep was followed by a stimulus
display that had four gray disks and an occluder. A randomly chosen tracking
target was precued by flashing three times. Then 300 ms later, four disks
changed colors from gray to four different colors and began rotating. The
direction of pattern rotation was randomized across trials. The pattern smoothly
and continuously rotated in the same way as in Experiment 2. The visible and
occluded durations were both 360 ms, and the rotation speed of the pattern was
manipulated by the relative motion of the pattern and the occluder. In the
90°/period condition, where the rotation speed was matched to the 60°
condition in Experiment 3, the pattern alone rotated, and the occluder was
stationary as in Experiment 2. However, in the 60°/period and
30°/period conditions, the rotation speeds of the patterns were two thirds
and one third of the 90°/period condition, respectively, and unlike
Experiment 2, the occluder rotated in the opposite direction with velocities to
match the visible and occlusion durations across the three conditions.
Therefore, whereas the rotation speed of the pattern was different across
conditions, the visible and occluded durations were equal across conditions. The
color switch occurred between a pair that contained the tracking target
(on-target trials), or between a pair that did not contain the target
(off-target trials). The numbers of on-target and off-target trials were equal.
At the last visible period, the four disks appeared in gray, and stopped at the
middle of the opening of the occluder. To prevent participants from predicting
the stopping location of the tracking target by timing, the last visible period
varied randomly between the 9th and 11th periods.
Participants always judged the color switch first by
pressing the 1 or 3 key as before. When they made a color-switch judgment, and
the pattern rotation stopped, an arrow cursor appeared at the middle of the
occluder, and they were asked to click the location of the tracking target.
There were six experimental conditions composed of two factors; rotation angle
(90°, 60°, and 30°) and color-switch location (on-target and
off-target). Each condition had 24 color switches, and 24 no-change trials, half
of which had clockwise rotation. The total number of experimental trials was
288. Participants were instructed to try to be accurate for both tasks, and not
to sacrifice one task for the other. There was no correct feedback for both
tasks. Participants were eight Kyoto University graduate students who had normal
or corrected-to-normal
vision. Figure 8. a. Illustration of conditions in Experiment 7.
b. Mean hit and false alarm rates in Experiment 7.
Overall, Experiment 7 showed that the difficulty in
color-switch detection is not due to tracking failure. First, not surprisingly,
the tracking performance was highly accurate
( M=0.962), and not affected by rotation
angle. A 3 (rotation angle) x 2 (color
switch, present or absent) x 2
(color-switch location, on- or off-target) ANOVA showed no significant main
effects or interactions. This accurate target tracking is consistent with
findings of attentive tracking ( Verstraten, et
al., 2000) and suggests that within this range of rotation angle, attentive
tracking performance is not disrupted by frequent occlusions.
The performance of the color-switch detection task was
analyzed on the condition of the successful tracking. The conditionalized hit
rates and false alarm rates are shown in Figure
8b, and d’s are shown in Table 1.
Because of the extremely high accuracy in the tracking task, conditionalized hit
rates and false alarms were virtually identical to the unconditionalized
counterparts, and the statistical analyses showed the same pattern of results.
Thus, I report only the conditionalized data analyses. The conditionalized hit
rates were analyzed by a 3 (rotation angle)
x 2 (color-switch location: on- or
off-target) ANOVA, showing significant main effects of rotation angle,
F(2,14)=29.46, and color-switch
location, F(1,7)=6.59. The interaction
was not significant, F(1,7)=1.91.
Unlike the tracking performance, the conditionalized hit rate decreased as the
rotation angle increased, which is consistent with the results of Experiments
3-5. Moreover, the switch detection was more accurate when a switching item was
on the tracking target than when it was off. This result may indicate that
focused attention facilitates color-switch detection. More important, there was
a significant effect of rotation angle even when the switching items contain the
tracking target, F(2,14)=6.28,
suggesting that successful tracking is not sufficient for color-switch
detection.
The results of Experiment 7 suggest that results in
Experiments 1 and 2 mainly reflect the difficulty in maintaining color-location
bindings in the 60° rotation conditions, not a failure in tracking. Indeed,
the color-switch performance in the 90° rotation condition of this
experiment was comparable to that in the corresponding condition in Experiment
2. Because Experiment 2 used even fewer objects, it is highly unlikely that the
difficulty in Experiment 2 was due to tracking failure induced by frequent
occlusions.
One might argue that extremely accurate tracking
performance is somewhat deceptive, because the task asks participants to track
only a single object. It might be the case that participants could track only
the cued object, while the motion correspondence of the other three objects was
completely lost. Although the data from this experiment could not rule out this
extreme possibility, even if it should be the case, the tracking failure
hypotheses cannot explain the difficulty in color-switch detection. Under the
assumption that non-target objects are lost, the tracking failure hypothesis
predicts the on-target condition will show extremely accurate color-switch
performance, while only the off-target condition will show errors. However, this
prediction was not supported by the data. Color-switch detection suffered from
the increase in rotation angle, regardless of the tracking target location.
Somewhat surprisingly, there were a significant number of misses of color switch
even when the color switch occurred on the tracking target. More natural
interpretation of the results of this experiment seems that the better detection
of color switch for the on-target condition is due to the distribution of
attention biased to the tracking target.
The properties of episodic representations of multiple
objects appear to be quite different between static and dynamic situations. In a
static situation, as in Luck and Vogel (1997),
correct feature-location binding for up to four objects can be maintained. In
contrast, in a dynamic setting as in this study, correct feature-location
binding for three objects did not seem to be maintained to the degree that
allows color-switch detection. A series of experiments revealed that even when
the motion correspondences are unambiguous by the use of pattern configurations
and continuous motion, and object tracking is successful as in Experiment 7,
color-switch detection performance is difficult; there was no significant
improvement compared with the situations where motion correspondences were
inherently ambiguous. At the same time, it has been revealed that color-switch
detection performance is critically dependent on the interframe rotation angle,
and that a facilitatory effect occurred only when spatiotemporal predictability
was satisfied.
One important aspect of these results is that
color-switch detection is difficult even when objects are easily trackable.
Thus, it is unlikely that the difficulty is due to tracking failure. Clearly,
spatial uncertainty and stimulus complexity alone cannot explain the results
either. Participants’ top-down knowledge of predictable pattern rotation
is not sufficient to detect an irregularity. Because the number of objects in
the sequences was just three, stimulus complexity in terms of the number of
stimuli was lower than in previous studies, such as Luck and Vogel (1997) and Pylyshyn and Storm (1988). There are some
possible reasons for the difficulty in color-switch detection. First, the items
in this study were somewhat closer to each other than in other studies. However,
the between-item distances were not so dramatically different from other
studies, so it is unlikely that this is the major reason. Second, the regular
rotation repeatedly places the objects in the same position as the other objects
had been a moment before. This factor may have played significant roles. The
repeated placement of different colored disks on the same location may create
substantial interference in color-location bindings, and a recent study by Wheeler and Treisman (2002) suggests that
interference among items is a major factor impairing binding in visual short
term memory. Wheeler and Treisman found that in a change-detection task with
static stimuli similar to Luck and Vogel, there
was a significant impairment in change detection of color switch (they call the
binding condition) when the probe is the whole display. The impairment
disappeared when the single item probe was used, suggesting that interference by
the items in the whole display probe is a major factor in impairment.
This interference account can explain the facilitation
by smaller interframe rotation angle as well. The reduction of interframe
rotation angle makes the distance between the current and previous position of
one object smaller than the distance between the current position of one object
and previous position of another object, which not only helps motion
correspondences but also reduces the interference in the binding memory. In
contrast, smooth, continuous motion (Experiments 2 and 7) helps only motion
correspondences, and the same interference in the binding memory as long as the
interframe rotation angle is large. Furthermore, the disappearance of the
facilitatory effect of smaller rotation angle with unpredictable rotation in
Experiment 6 is consistent with the interference account, because spatial
uncertainty in rotation is likely to increase interference.
Although more systematic investigations are certainly
necessary, the results of this study suggest that the problems with color-switch
detection reflect the spatiotemporal interference in feature-location binding in
visual working memory, not the encoding of the objects’ color and motion
per se. Further study needs to investigate whether tightly bound object
representations are formed in dynamic situations, but are distracted by
spatiotemporal interference, or the formation of multiple object representations
itself is restricted to static situations. Given recent challenges to
Luck and Vogel’s (1997) findings ( Wheeler & Treisman, 2002; Xu, 2002), the validity of the object-based account
of visual working memory in general has also to be critically evaluated. In the
course of these investigations, the irregularity detection task introduced in
this study can play an important role. Further improvements in the experimental
paradigm and systematic comparisons with other studies may reveal the mechanism
of formation, maintenance, and transformation of episodic representations in
visual cognition. In particular, we need to know what aspects of the current
findings are specific to the particular display used in this study, and how well
they can be generalized to other displays and experimental paradigms.
Mechanism of Transformation of Episodic Representations
In this section, I consider the following two factors
to discuss the difficulty of color-switch detection in a dynamic situation. The
first factor is whether the difficulty resides in the spatiotemporal
correspondence or in the feature-location binding. According to Kahneman et al. (1992), perceptual continuity
is achieved by these three suboperations: (1) a correspondence operation, (2) a
reviewing operation, and (3) an impletion process. The correspondence operation
determines which object in a display is an object recently perceived at a
different location. The reviewing process retrieves the characteristics of the
previous object, not currently seen. The impletion process uses current and
reviewed information to produce a percept of change or motion. Presumably, the
correspondence operation does not depend on featural information, such as color
and shape, as shown by the apparent motion literature ( Kolers, 1972). Featural properties play
essentially no role in apparent motion, unless the spatiotemporal parameters are
perfectly balanced. However, the impletion process (given successful retrieval)
involves the binding of featural information and spatiotemporal information by
definition. The results of this study and some previous studies suggest that the
difficulty in color-switch detection resides in the impletion operation. It is
difficult to explain why various manipulations of improving correspondence did
not substantially improve participants’ performances. In particular, the
elimination of abrupt onset and offset did not improve color-switch detection
(Experiment 2), whereas Scholl and Pylyshyn
(1999) showed that it had a substantial effect on tracking performance.
Although the visual system is successful in making correspondence, it may fail
in the impletion process. This impletion failure hypothesis can account for the
inconsistency between this study and Pylyshyn
and Storm (1988). Multiple object tracking can be performed by a
correspondence operation alone, whereas the color-switch detection in this study
required the impletion process.
The second factor is whether predictive transformation
of episodic representations has anything to do with the difficulty in
color-switch detection. Unlike Kahneman et al. (1992)
and Pylyshyn and Storm (1988), in which
there was spatial uncertainty regarding the correspondence, participants in this
study may have spontaneously transformed their episodic representations during
the blank period, because the future locations were completely predictable
except for Experiments 6 and 7. Given the substantial impairment of the
color-switch detection in Experiment 6, predictive transformation of episodic
representations during the blank period appears to be an important determinant
of color switch detection performance. Therefore, overall, the underlying
mechanism of color-switch detection seems to be the predictive spatiotemporal
transformation of feature-location bindings. However, as mentioned above, the
effect of predictability may be related to spatiotemporal interference, not to
top-down spatiotemporal prediction per se, so the issue of the role of top-down
prediction is somewhat unclear at this point.
These arguments imply that multiple object tracking and
irregularity detection tasks reflect distinct processes. The tracking task may
reflect the correspondence operation, which is more automatic and bottom-up;
thus, spatiotemporal predictability is not important, but local characteristics
such as abrupt onset and offset have substantial effects on performance ( Scholl & Pylyshyn, 1999). However, the
irregularity detection task may reflect the predictive feature-location binding
mechanism, which is less sensitive to local information, such as local motion
and abrupt onsets, while the effect of top-down predictability may be essential.
Further research directly investigating these issues is necessary to understand
the mechanism of transformation of episodic representations.
Relation to Object Files and FINSTs
Object files and FINSTs are two major theoretical
notions previously proposed. Findings in this study revealed further
spatiotemporal constraints on the maintenance of these episodic object tokens.
Kahneman et al. (1992) argued that object files can
survive some spatiotemporal gaps as in the case of occlusion and apparent
motion, but their limits were not systematically investigated. This study
revealed that there are some severe spatiotemporal limits for the survival of
multiple object files: beyond 45° regular rotation, even three object files
are difficult to maintain. This result is not quite consistent with the findings
of Kahneman et al. that four object files can be maintained concurrently, and
that object files can survive a spatiotemporal gap of 590-ms ISI. There are some
possible reasons for this inconsistency, and an important one is the difference
in experimental paradigms. Kahneman et al. used a review paradigm that is
similar to the priming paradigm; thus, the object-specific preview benefit seems
to reflect implicit aspects of the maintenance of object files. In contrast, the
irregularity detection task in this study clearly investigated the explicit
detection of color switch or replacement. Therefore, the inconsistency between
this study and that of Kahneman et al. may reflect dissociation between the
explicit and implicit nature of visual cognition. Another possibility is that
the use of object linkers in Kahneman et al. may have facilitated the
maintenance of object files. In their Experiment 5 with four objects, the object
frames (squares) remained visible while target letters disappeared, so the
spatiotemporal continuity of perceptual objects was maintained while their
featural contents (with or without letters) underwent substantial changes during
the objects’ motion. On the other hand, all experiments in this study had
some substantial blank period during which no object information was presented
on the display. Further studies using common stimuli and paradigms are
necessary.
The findings of this study can be accounted for by the
FINST theory in that there is a distinct stage of spatial indexing which is
preattentive, but capacity-limited. According to Pylyshyn (1989), spatial indexing and
feature-location binding are separate mechanisms, and encoding visual features
and binding them to locations require additional processing stages. The findings
of this work can be considered as reflecting feature-location bindings, which
are presumably more capacity-limited than the spatial indexing investigated by
the multiple object-tracking paradigm. Kahneman et al.
(1992) claimed that a FINST might be the initial phase of an object file
before any features are attached to it, and according to this interpretation,
the findings of this study suggest that in dynamic situations people are
successful only in the initial phase of object file formation.
Relation to Studies on Visual Working Memory
A recent study on visual working memory by Luck and Vogel (1997) claimed that functional units
of visual working memory are perceptual objects, not features. The results of
this study cast some doubt on their interpretation. Clearly, in a dynamic
situation, even three perceptual objects cannot be concurrently maintained
beyond some limited spatiotemporal gap. The findings of this study, Luck and Vogel, and Pylyshyn and Storm (1988) suggest certain
characteristics of perceptual objects in visual cognition. First, the situation
of Luck and Vogel’s experiments can be interpreted as a situation with
maximum spatiotemporal predictability. According to this interpretation, the
coherence of perceptual objects can be maintained only within limited
situations; a limited amount of predictable spatiotemporal transformation.
Because the range where multiple episodic representations are successfully
maintained is much more limited than the range over which people can perceive
objects’ continuity (e.g., apparent motion, and motion behind an
occluder), it is unclear whether we can claim that the results of this study and
Luck and Vogel reflect representations of
perceptual objects in the usual sense. Alternatively, Luck and Vogel’s
data may reflect processes qualitatively different from this study’s.
Their data may reflect temporary aggregate of features bound to particular
locations, not bound to perceptual objects. It has been argued that location
plays a privileged role in visual cognition ( Treisman, 1988; Tsal & Lavie, 1993). Recently, some studies
have cast doubt on Luck and Vogel’s data, suggesting the role of
independent feature memories ( Wheeler &
Treisman, 2002; Xu, 2002). For example, Wheeler and Treisman (2002) failed to replicate
the critical color-color conjunction condition of Luck and Vogel. The plausibility of these
alternative interpretations depends on the consistency of the results of this
study with smaller predictable transformations, with the results of Luck and
Vogel. Further studies involving various spatiotemporal and featural
transformations are necessary to resolve this issue.
The difficulty in color-switch detection may be related
to the functional dissociation between object and spatial working memory.
Research in physiology and functional brain imaging suggests that spatial and
object working memory systems reside in distinct brain regions ( Smith & Jonides, 1997; Wilson, O’Scalaidhe, & Goldman-Rakic,
1993; but see Rao, Rainer, & Miller,
1997). The irregularity detection task may require integration of
information stored in distinct brain regions. Whether the difficulty in
maintaining multidimensional features is limited to situations where object and
spatial working memory are to be integrated is beyond the scope of this study,
and further study with various types of feature conjunctions is necessary.
This work is consistent with some recent evidence that
the system of visual cognition is working with much less memory than we believe
( Ballard, Hayhoe, Pook, & Rao, 1997; Horowitz & Wolfe, 1998; Rensink, O’Regan, & Clark, 1997).
For example, a phenomenon called change blindness shows that people are
surprisingly poor at noticing large changes to objects, photographs, and motion
pictures from one instant to the next ( Simons,
2000). Contrary to some suggestions that change blindness merely reflects
inefficient visual search for temporal change in complex stimuli, this study
shows that a similar impairment in visual cognition can occur with quite simple
and regular stimuli. Although we are able to store multidimensional information
in a static display ( Luck & Vogel, 1997),
and to track dynamic changes of multiple unidimensional objects ( Pylyshyn & Storm, 1988), we can store
only one or two multidimensional dynamic objects. The maintenance and
transformation of episodic representations of multiple objects seem to involve a
dynamic process, which is determined by featural, spatiotemporal properties, and
locational predictability.
Our ability to maintain episodic representations of
multiple objects in a completely predictable dynamic situation is limited. This
finding strongly suggests that previous findings obtained with static displays
( Luck & Vogel, 1997) and a dynamic multiple
object-tracking task ( Pylyshyn & Storm,
1988) may not reflect the function of common high-level episodic
representations, such as object files, where featural and spatial information is
coherently bound together. Instead, previous findings are likely to be mediated
by lower-level representations, such as FINSTs in the case of multiple object
tracking, and location-based feature clusters in the case of Luck & Vogel.
As the FINST theory claims, spatiotemporal indexing and feature-location binding
are separate mechanisms, and the latter requires substantial additional
resources. To maintain feature-location binding of multiple objects,
spatiotemporal continuity for successful object tracking is necessary but not
sufficient. Regardless of the smoothness of the objects’ motion, the
maintenance of feature-location binding of three objects is possible only when
the rotation angle across gap is smaller than 45°, suggesting that a
reduction in spatiotemporal interference among feature-location bindings may be
critical. In addition, spatiotemporal predictability may be necessary for
successful maintenance of feature-location bindings.
I thank Toshio Inui, Tram Neill, Jane Raymond, and
three anonymous reviewers for helpful comments on earlier manuscripts. This work
was supported by Japanese Ministry of Education, Culture, Sports, Science and
Technology Grants-in-Aid for Scientific Research (No.11610075, 12551001,
13610084, and 14019053), The Research for the Future Program from the Japan
Society for the Promotion of Science (JSPS-RFTF99P01401), and Toyota High-Tech
Research Grant Program. Commercial Relationships: None.
Ballard, D. H., Hayhoe, M. M,
Pook, P. K, & Rao, R. P. N. (1997). Deictic codes for the embodiment of
cognition. Behavioral and Brain Sciences,
20, 723-767. [PubMed]
Brainard, D. H. (1997). The
Psychophysics Toolbox. Spatial Vision,
10, 443-446. [PubMed]
Horowitz, T. S., & Wolfe,
J. M. (1998). Visual search has no memory.
Nature, 394, 575-577. [PubMed]
Kahneman, D., & Treisman,
A. (1984). Changing views of attention and automaticity. In R. Parasuraman &
D. A. Davis (Eds.), Varieties of attention
(pp.29-62). New York: Academic Press.
Kahneman, D., Treisman, A.,
& Gibbs, B. J. (1992). The reviewing of object files: Object specific
integration of information. Cognitive
Psychology, 24, 175-219. [PubMed]
Kanwisher, N. (1987).
Repetition blindness: Type recognition without token individuation.
Cognition, 27, 117-143. [PubMed]
Kolers, P. A. (1972).
Aspects of motion perception. Elmsford,
NY: Pergamon.
Luck, S. J., & Vogel, E. K.
(1997). The capacity of visual working memory for features and conjunctions.
Nature, 390, 279-281. [PubMed]
Pelli, D. G. (1997). The Video
Toolbox software for visual psychophysics: Transforming numbers into movies.
Spatial Vision, 10, 437-442. [PubMed]
Phillips, W. A. (1974). On the
distinction between sensory storage and short-term visual memory.
Perception and Psychophysics, 16,
283-290.
Pylyshyn, Z. W. (1989). The
role of location indexes in spatial perception: A sketch of the FINST
spatial-index model. Cognition, 32,
65-97. [PubMed]
Pylyshyn,
Z. W. (2001). Visual indexes, preconceptual objects, and situated vision.
Cognition, 80, 127-158. [PubMed]
Pylyshyn, Z. W., & Storm,
R. (1988). Tracking multiple independent targets: Evidence for both serial and
parallel stages. Spatial Vision, 3,
179-197. [PubMed]
Rao, S. C., Rainer, G., &
Miller, E. K. (1997). Integration of what and where in the primate prefrontal
cortex. Science, 276, 821-824. [PubMed]
Rensink, R. A., O’Regan,
J. K., & Clark, J. J. (1997). To see or not to see: The need for attention
to perceive changes in scenes. Psychological
Science, 8, 368-373.
Scholl,
B. J., & Pylyshyn, Z. W. (1999). Tracking multiple items through occlusion:
Clues to visual objecthood. Cognitive
Psychology, 38, 259-290. [PubMed]
Simons, D. J. (2000). Current
approaches to change blindness. Visual
Cognition, 7, 1-15.
Smith, E.
E., & Jonides, J. (1997). Working memory: A view from neuroimaging.
Cognitive Psychology, 33, 5-42. [PubMed]
Treisman, A. (1988). Features and objects: The
fourteenth Bartlett memorial lecture.
Quarterly Journal of Experimental Psychology:
Human Experimental Psychology, 40A, 201-237.
Tsal, Y., & Lavie, N. (1993).
Location dominance in attending to color and
shape . Journal of Experimental Psychology:
Human Perception and Performance, 19, 131-139. [PubMed]
Verstraten, Y., Cavanagh, P.,
& Labianca, N. (2000). Limits of attentive tracking reveal temporal
properties of attention. Vision Research,
40, 3651-3664. [PubMed]
Wheeler, M. E., & Treisman,
A. (2002). Binding in short-term visual
memory . Journal of Experimental Psychology:
General, 131, 48-64. [PubMed]
Wilson, F. A. W.,
O’Scalaidhe, S. P., & Goldman-Rakic, P. S. (1993). Dissociation of
object and spatial processing domains in primate prefrontal cortex.
Science, 260, 1955-1958. [PubMed]
Xu, Y. (2002). Limitations of
object-based feature encoding in visual short-term memory.
Journal of Experimental Psychology: Human
Perception and Performance, 28, 458-468. [PubMed]
Yantis, S., & Hillstrom, A.
P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant
visual objects. Journal of Experimental
Psychology: Human Perception and Performance, 20, 95-107. [PubMed]
Yantis, S., & Jonides, J.
(1996). Attentional capture by abrupt onsets: New perceptual objects or visual
masking? Journal of Experimental Psychology:
Human Perception and Performance, 22, 1505-1513. [PubMed]
|