| Volume 4, Number 12, Article 3, Pages 1006-1019 |
doi:10.1167/4.12.3 |
http://journalofvision.org/4/12/3/ |
ISSN 1534-7362 |
Perceptual learning through optimization of attentional weighting: Human versus optimal Bayesian learner
Miguel P. Eckstein |
Vision & Image Understanding Lab, Department of Psychology, UC Santa Barbara, Santa Barbara, CA, USA |
|
Craig K. Abbey |
Vision & Image Understanding Lab, Department of Psychology, UC Santa Barbara, Santa Barbara, CA, USA |
|
Binh T. Pham |
Vision & Image Understanding Lab, Department of Psychology, UC Santa Barbara, Santa Barbara, CA, USA |
|
Steven S. Shimozaki |
Vision & Image Understanding Lab, Department of Psychology, UC Santa Barbara, Santa Barbara, CA, USA |
|
Abstract
Human performance in visual detection, discrimination, identification, and search tasks typically improves with practice. Psychophysical studies suggest that perceptual learning is mediated by an enhancement in the coding of the signal, and physiological studies suggest that it might be related to the plasticity in the weighting or selection of sensory units coding task relevant information (learning through attention optimization). We propose an experimental paradigm (optimal perceptual learning paradigm) to systematically study the dynamics of perceptual learning in humans by allowing comparisons to that of an optimal Bayesian algorithm and a number of suboptimal learning models. We measured improvement in human localization (eight-alternative forced-choice with feedback) performance of a target randomly sampled from four elongated Gaussian targets with different orientations and polarities and kept as a target for a block of four trials. The results suggest that the human perceptual learning can occur within a lapse of four trials (<1 min) but that human learning is slower and incomplete with respect to the optimal algorithm (23.3% reduction in human efficiency from the 1st-to-4th learning trials). The greatest improvement in human performance, occurring from the 1st-to-2nd learning trial, was also present in the optimal observer, and, thus reflects a property inherent to the visual task and not a property particular to the human perceptual learning mechanism. One notable source of human inefficiency is that, unlike the ideal observer, human learning relies more heavily on previous decisions than on the provided feedback, resulting in no human learning on trials following a previous incorrect localization decision. Finally, the proposed theory and paradigm provide a flexible framework for future studies to evaluate the optimality of human learning of other visual cues and/or sensory modalities.
 |
|
History
Received November 17, 2003; published December 7, 2004
Citation
Eckstein, M. P., Abbey, C. K., Pham, B. T., & Shimozaki, S. S. (2004). Perceptual learning through optimization of attentional weighting: Human versus optimal Bayesian learner.
Journal of Vision, 4(12):3, 1006-1019,
http://journalofvision.org/4/12/3/,
doi:10.1167/4.12.3.
Keywords
perceptual learning, Bayesian, ideal observer, attention, efficiency, noise
for related articles by these authors
for papers that cite this paper |
The ability of humans to detect changes in a number of
visual attributes improves with practice. This has been shown for orientation
(Matthews, Liu, Geesaman, & Qian, 1999),
motion (Ball & Sekuler, 1987), spatial
displacement (Mckee & Westheimer, 1978),
vernier acuity (Beard, Levi, & Reich, 1995; Fahle & Edelman, 1995), and texture segmentation (Karni & Sagi,
1993; for a review, see Fine & Jacobs, 2002; Goldstone, 1998). Researchers have attempted to
elucidate the mechanisms by which this learning process takes place. A number
of psychophysical studies have measured performance in perceptual tasks in
different amounts of external noise to determine whether learning is mediated by
a decrease in additive noise, multiplicative noise, and/or an increase in the
observers’ ability to integrate information across the signal more
optimally. These studies support the idea that perceptual learning improves the
observers’ ability to integrate over signal relevant information more
efficiently (Gold, Bennett, & Sekuler, 1999; Gold, 2003; Hurlbert, 2000; Dosher & Lu, 1998; Beard & Ahumada, 1999), perhaps through reweighting of basic
sensory units (Dosher & Lu, 1998).
This process might even play a greater role in the real
world where perceptual tasks often involve complex stimuli encompassing visual
cues, some that are relevant to the perceptual task at
hand and others that are irrelevant.
Tasks such as discriminating human faces and objects or searching for a target
object in a visual scene require training to distinguish relevant visual cues
from those that are irrelevant. When first encountering a complex perceptual
task, humans are typically uncertain about which are the relevant cues that will
allow them to best perform the visual task. With practice, they learn to attend
(i.e., give weight) to visual cues that contain information and ignore those
cues that are not informative. In this framework, attention allows the observer
to differentially weight and/or select sensors coding task relevant information
(Kinchla, Chen, & Evert, 1995;
Eckstein, Shimozaki, & Abbey, 2002;
Murray, Sekuler, & Bennett, 2003;
Shimozaki, Eckstein, & Abbey, 2003).
This particular learning process has been a central
concept in classical studies of infant perceptual development and adult
perceptual learning (Gibson, 1969, 2000; Goldstone, 1998). It has been referred to as learning
through reduction in uncertainty. 1 The
uncertainty refers to the observers’ initial lack of knowledge about the
visual cues that are relevant for the visual task. Gibson has also referred to
the process as learning through attention optimization (Gibson, 1969, 2000;
Goldstone, 1998). The process of learning
through attention has also been supported by recent findings showing the ability
of V1 cells in macaques to dynamically modify the processing of visual
information depending on immediate behavioral requirements (Crist, Kapadia,
Westheimer, & Gilbert, 1997). In
addition, functional imaging suggests that perceptual learning correlates with
durable neural changes at the earliest stages of the visual system. The change
in processing is suggested to be modulated by top-down attentional processes
from higher level cortical areas (Sur, Schummers, & Dragoi, 2002).
But how fast can the process of perceptual learning
through attention optimization occur? One problem in measuring fast learning is
that in typical studies, perceptual performance is calculated across groups or
blocks of at least 25-50 trials, making it hard to observe short-term changes in
performance. Here we propose to use a new experimental paradigm (optimal
perceptual learning, OPL) to systematically study rapid learning through
attention optimization.
In addition, the nature of the neural algorithm
mediating this type of perceptual learning remains a second question. One useful
starting point in elucidating the human learning algorithm is to compare it to
that of an optimal Bayesian learning algorithm. The proposed OPL paradigm is
designed so that an optimal Bayesian observer learns as trials progress, and,
therefore, allows the investigator to compare the amount of learning of the
human observer to that of an optimal learner. 2
The use of the optimal Bayesian framework brings the
same benefits it has brought to other areas of perception (e.g., detection and
discrimination, Burgess, Wagner, Jennings, & Barlow, 1981; object recognition, Liu, Knill, &
Kersten, 1995; Liu, Kersten, & Knill, 1999; Tjan, Braje, Legge, & Kersten, 1995; Tjan &
Legge , 1999;
and attention, Eckstein, Shimozaki et al., 2002; Shimozaki et al., 2003; for a review, see
Kersten &
Yuille , 2003; Geisler, 2003; Kersten, Mamassian, & Yuille, 2004). First, it allows researchers to compare
the amount of learning observed in humans to that of an ideal learner, and,
therefore, to establish a standard to which human learning can be compared for a
variety of tasks. Second, it provides a framework that takes into account task
complexity or stimulus information (Liu et al., 1995, 1999). This
allows the investigator to disambiguate whether human perceptual learning in
task A is larger than task B due to some property of the human perceptual system
or whether it simply reflects stimulus information inherent to the task.
Optimal perceptual learning paradigm
In the present OPL task, an image is presented and the
observers search for one of four possible targets (elongated Gaussians with
different orientations and polarity) in one of eight locations ( Figure 1). Trials are blocked into groups of four,
which we will refer to as learning blocks. For each learning block, a target is
randomly selected from the four possible targets (with equal probability) and
presented throughout a block of trials. However, on each trial, the location of
the target is randomly chosen, and the task of the observer is to localize the
target. In addition, on the last (4 th) trial of a learning block,
the observers have to identify the target present throughout that learning
block. At the end of each learning trial, feedback is provided to the observers
about the location of the target for that trial but not the target’s
identity. At the end of the last (4 th) learning trial following the
identification decision, feedback is provided about the identity of the target
present throughout that learning block. Figure
1 outlines the timeline of the experimental procedure.
Figure 1. Optimal perceptual learning
paradigm. At the beginning of a learning block, one of M targets (targets have
been magnified for presentation purposes) is randomly sampled and used for all
the trials on that block. On each trial of a learning block, the target is
randomly placed in one of eight locations (centered within the black boxes). On
each trial the observer localizes the target and also identifies it on the last
(4th) trial. At the end
of each learning trial, feedback about the location containing the target is
provided.
Human performance localizing the target is quantified
by calculating the proportion correct localization for each learning trial
(1st through 4th for the current study).
Why would learning be expected across learning trials?
The main concept in the OPL paradigm is that on the
first trial within a learning block, the observer is uncertain about the
identity of the target presented. Given this uncertainty, let us assume that
the observer initially monitors one sensory unit tuned to each of the four
possible targets. Note, however, that it is only one sensor, out of the four
sensory units, that is coding relevant information to the target being presented
while the others are irrelevant (or partially irrelevant). Considering that each
sensor is also transmitting noise (due to either the visual noise on the image
or internal to the neural sensor), then integrating the responses (nonlinearly
or linearly) across all sensors to make a decision about target location will
bring additional noise from irrelevant sensors into the decision. This
additional variability in the decision variable will degrade localization
performance.
However, as the trials progress from the
1st-to-4th trial, the observer can use the location
feedback to collect evidence about the presence of one or another specific
target in that learning block. The varying amounts of evidence for each of the
possible targets at the target location specified by the feedback can be used on
the subsequent trial to increase the weights of sensory units tuned to targets
associated with higher evidence and reduce the weight of sensory units tuned to
targets associated with lower evidence. Performance in the localization task
improves because of the increase in the optimality of the weighting of the
different sensors.
But what is the best way to change the weights to the
sensory units as the learning trials progress to maximize the amount of
learning? This leads us to the theory of the optimal Bayesian
learner.
The Bayesian observer allows us to establish the
optimal algorithm for the perceptual learning task, and, therefore, obtain the
performance improvement associated with this ideal decision rule. The ideal
observer learns by using the image data in the present trial to modify the
weights given to a nonlinear transformation of the responses of each sensory
unit in future trials.
On each trial, the ideal observer computes the
posterior probability of the signal (i.e., target) presence at each of the
possible signal locations given the data at all locations
( g) and chooses the
location with the highest posterior probability. The posterior probability at
the ith
location can be related to the likelihood of the data at all locations given
signal presence at the
ith
location, through Bayes’ rule (Peterson, Birdsall, & Fox, 1954; Green & Swets, 1966): | P(i|g)
=
P(i)
P(g|i)/
P(g) | (1) |
where
P(i|g)
is the posterior probability of the signal being present at the
ith
location given the data at all locations
(g),
P(g|i) is the
probability of the data at all locations given target presence at the
ith
location and is typically known as the likelihood
(l), and
P(i)
is the prior probability of the signal being present at the
ith
location.
P(g)
is the probability of the data, which is independent of locations and can,
therefore, be replaced by 1 without affecting the outcome of the
decisions.
On the first trial
( t
= 1) of a learning block, the
optimal Bayesian observer (see Figure 2)
calculates the posterior probability. However, because there is uncertainty
about which of the signals is present for that block of learning trials, it
computes the posterior probability for each of the
J possible signals
( J
= 4 for the task in the present work). This is equivalent to computing a ratio of the likelihood of the data at the ith
location given signal presence
( P(gi|sj)
and the likelihood of the data at the
ith
location given signal absence
( P(gi|n)
(Green & Swets, 1966). The optimal observer then sums the
likelihood ratios across signal types to compute a sum of weighted likelihoods
for each location. The individual likelihood ratios are weighted by the prior
expectation of each of the possible signals. On the first learning trial, the
prior is
1/J
given that each signal has equal probability of being sampled. On trial
t, the location
with the highest weighted sum of likelihoods
( SLRi,t)
is chosen as containing the
target:  | (2) |
where
ℓi,t,j
is the likelihood ratio of the data at location
i, for the
tth
learning trial and for the
jth
signal, and
π j,t
is the weight (known as the prior) given to the likelihood of the
jth
signal on the
tth
trial. For white Gaussian noise, the likelihood ratio for each location and
signal is given by (Peterson et al., 1954):
 | (3) |
where
sj
is a column vector containing the
jth
signal and
gi,t
is a column vector containing the data at the
ith
location for the
tth
trial, and
Ej
is the energy of the
jth
signal
(Ej=
sjT
sj
where the superscript
T stands for
transpose). Note that
sjT gi,t
can be thought of as the response of a linear sensor (matched to the
jth
signal) and the data
(g) at the
ith
location. Also
σ2
is the variance of the noise at each pixel.
Figure 2. Ideal observer decision rule for
the perceptual learning paradigm. On each trial and location, the ideal
observer computes the likelihood of the data given the presence of each of the
four possible signals (ℓ;
left panel). It then sums the weighted likelihoods
(π) and chooses the location with
the highest scalar response (left panel). After the decision, feedback is given
about the location of the target (red circle, right panel). The optimal
observer calculates the likelihood of the data given that each of the targets
was present at the location specified by the feedback (right panel). The weights
(priors) are updated for the next trial based on the signal likelihoods from the
locations, indicated by the feedback on all previous trials (right panel).
Figure 2 (left panel)
shows a schematic of the ideal observer’s decision rule for the current
task.
After the 1st trial decision, feedback is
given about the target location. The ideal observer has perfect memory.
Therefore, the algorithm retains the likelihood of the data at the target
present location for each of the
J possible signals:
 | (4) |
where the subscript
sp for the
likelihood and the data vector
g refers
to the location that contains the signal
( sp = signal
present). All other symbols are defined as in Equation 3.
For the 2 nd trial, the optimal observer will
calculate the individual likelihood ratios for each location and target for the
new image. However, on this trial it will weight (i.e.,
πj,t
the prior in Equation 2) the 2 nd
trial likelihoods for each signal by the calculated likelihoods from the signal
present location of the 1 st trial
( ℓsp,1,j,
Equation 4). In other words, if there was more
evidence for one of the signals on the 1 st trial, then the optimal
observer increases the weight to that signal on the 2 nd trial. This
process is repeated for the 3 rd and 4 th trials, updating
the prior for each possible signal with the likelihoods from the previous trials
as given
by  | (5) |
The likelihood ratio for the data at the signal present
location in the
t’
trial given a particular signal
j
( ℓsp,r’,j)
is calculated by Equation 4.
On each learning trial, Equation 2 with the updated prior is used to make
the localization decision. Figure 2 (right
panel) shows a schematic of the process of prior update ( Equation 5) for the optimal observer. As the
learning trials progress, the prior for the signal increases relative to that of
the irrelevant elements. Figure 3 shows
development of the weights or priors for each of the possible signals as the
learning trials progress for the localization of one of 4 elongated Gaussians
(only for those trials in which signal 1 was the target). For presentation
purposes, the graph shows the priors normalized on each trial so that they sum
to 1 (  ). This
normalization does not affect performance of the optimal Bayesian model. Figure 3 shows that on average the progression of
the prior for signal 3 toward zero is slower than for signals 2 and 4. This is
explained by the fact that for our stimuli, signals 1 and 3 are partially
positively correlated.
Figure 3. Progression of signal priors
(weights) as a function of learning trial averaged across 10,000 trials for an
ideal Bayesian observer. Results are for trials in which the first signal was
present. Similar results can be obtained for the other three signal trial
types.
Figure 4 shows
performance for the optimal observer as a function of trial number in the block
of trials for different signal contrasts. The amount of learning of the ideal
observer varies with the signal contrast. For both low and high signal
contrasts, the optimal learner shows little learning. The reduced learning at
high signal contrast is explained by the ceiling effect on localization
performance. At low signal contrasts (signal-to-noise ratios;
 ), the reduced
learning is due to the fact that the impoverished image information (low
signal-to-noise ratio) does not allow the optimal model to calculate priors that
reliably favor the signal present in the block of trials. As a result, the
progression of the signal relevant prior (normalized) toward unity will be much
slower and will lead to reduced learning.
Figure 4. Proportion correct target
localization as a function of learning trial for the optimal Bayesian observer
for different signal contrasts (signal-to-noise ratios).
Identification decision rule
To make a decision about the identification of the
signal on the 4th learning trial, the optimal observer calculates the
joint likelihood of the data at the signal present location on the
1st through 4th trials given the presence of each of the
possible signals and chooses the signal with the highest likelihood. The joint
likelihood across all trials is calculated
as  | (6) |
One way to compare human performance with respect to
the optimal observer is through a measure known as the efficiency (Barlow, 1980), defined as the ratio of squared contrast
threshold required for the ideal and human observers to reach a given
performance level (e.g.,
Pc
= 80 %):
 | (7) |
If the method of constant stimuli is used in the
experiment, the denominator of Equation 7 is
the contrast used in the experiment that led to an observed proportion correct
( Pc). For the
numerator, the investigator calculates the signal contrast that leads the ideal
observer to perform at that same level
( Pc) measured
experimentally for the human observer.
In the context of the perceptual learning paradigm, the efficiency can be calculated for all t learning trials:  | (8) |
Note that the denominator does not change with learning
trial (i.e., the contrast used in the experiment is the same for all learning
trials). However, human performance does change as a function of learning
trial, so for each trial the investigator needs to calculate the contrast that
leads the ideal observer to the measured human performance for that learning
trial. Figure 5 illustrates graphically the
process of obtaining the contrast for the ideal observer to achieve an
experimentally observed performance
( Pc).
Figure 5. Proportion correct target
localization in an 8AFC as a function of signal contrast for the optimal
Bayesian observer for different learning trial numbers (blue =
1st learning trial; red =
2nd learning trial; green
= 3rd learning trial;
black = 4th learning
trial). Dotted lines with arrows graphically illustrate the procedure to obtain
the contrast needed by an ideal observer to reach a measured human performance.
Example: Blue dotted line uses an observed human proportion correct for the
1st learning trial and
the performance of the ideal observer as a function of contrast (continuous blue
line) to find the contrast needed by the ideal observer to achieve the measured
performance (x-axis intercept of the
dotted line).
Figure 5 plots
proportion correct versus contrast for different learning trials for the ideal
observer. For each learning trial, the corresponding curve can be used to find
the signal contrast required by the ideal observer to achieve the empirically
measured human performance.
The efficiency as a function of trial number can then
be plotted to give rise to different learning signatures ( Figure 6): (a) An observer that learns as much as
the ideal observer has a constant efficiency as a function of trial number
(complete learning); (b) an observer that does not learn as much as the ideal
observer, partial or incomplete learning, has a decreasing efficiency with
increasing trial number; (c) an observer that learns more than the ideal
observer has an efficiency that increases with trial number (over-complete
learning); and (d) an observer that learns slower than the ideal observer will
have an early drop in efficiency followed by an increase in efficiency in the
last learning trials.
Figure 6. Efficiency signatures. Blue
empty triangles: complete learning; black empty squares: slow learning;
red-filled squares: incomplete or partial learning; and green-filled triangles:
over-complete learning.
Figure 3 shows the
evolution across learning trials of the priors for the optimal observer. It
would be useful to be able to quantify the departure of the prior distribution
from the initial uniform prior distribution. A distance metric in information
theory to quantify the similarity between two sets of probabilities is the
relative entropy or Kullback-Liebler divergence (Kullback, 1959). Here we use the relative entropy to
assess how a model’s distribution of priors compares to the set of uniform
priors in the initial trial of a learning block, where there is maximum
uncertainty about which signal is present for that block of trials. For our
task, the relative entropy can then be defined as (Kullback, 1959)
, | (9) |
where
πj,t
is the weight for the
jth
signal on the
tth
trial, and 1/N
stands for the uniform priors at the initial learning trial.
Figure 7 shows the
relative entropy for the ideal Bayesian observer for the present task as a
function of trial number. Note that the relative entropy will measure how the
distribution of priors of a model departs from uniform distribution irrespective
if the prior for a probable signal being favored corresponds to the signal
actually present in the block of trials. Thus, the relative entropy as a
function of learning trial measures how the distribution of priors converges to
unity for one signal (and zero for the remaining signals), irrespective of the
optimality of the process. Figure 7 shows that
how the relative entropy will also depend on the signal-to-noise ratio.
Figure 7. Relative entropy for the
optimal Bayesian observer as a function of learning trial number for five
different signal contrasts (signal-to-noise ratios) averaged across 20,000
learning blocks. For the present task, the upper bound for the relative entropy
is 2 bits. This occurs when one normalized prior is 1 and the remaining of them
are zero [D = 1.0
log2 (1.0/0.25) = 2]. Note
that the departure from the uniform prior distribution depends on the
signal-to-noise ratio.
Suboptimal learning algorithms
Often, replacing the optimal decision rule with
suboptimal strategies can be useful. If the suboptimal model’s performance
is lower than human performance, it could be argued that the human neural
algorithm cannot be the suboptimal decision rule tested. In our context, if the
amount of learning of the suboptimal model is inferior to that of humans, then
the model could arguably be rejected as a model of human learning.
Here we consider three suboptimal rules to update the
priors: (1) prior update based on the chosen location; (2) prior update based on
the chosen location for correct previous trials and no prior update for
incorrect previous trials; and (3) linear prior update
rule. Prior update based on the chosen location
Prior studies have shown that human observers have
limited capacity memory (unlike the ideal observers) for complex visual patterns
(e.g., Luck & Vogel, 1997). In this context,
one possibility is that observers do not use the location feedback because they
are unable to remember the image data presented at that location to efficiently
update the priors. Instead, the observers might update their priors (weights)
based on the image data at the chosen target location (for both correct and
incorrect trials) and not the image data at the actual target location specified
by the feedback. The prior update for this model can be described
by , | (10) |
where the subscript
ch for the
likelihood (ℓ) indicates that
the likelihood of each signal is calculated for the chosen location (the
location with the highest sum of likelihoods). Note that in those trials in
which the localization was correct, the model updates the priors in the same way
as an ideal observer. However, on the trials in which the localization decision
is incorrect, the model updates its prior based on a location that contained
only noise and no signal information. Thus, this model’s performance will
decrease on trials following previous incorrect localization trials.
Prior update for correct trials only
In an alternative model, the observer still updates its
priors based on the information at the chosen location but makes partial use of
the feedback. In this model, when the feedback indicates that the chosen
location was not the target location (incorrect location), then the observer
leaves the priors unchanged rather than update them based on image data at a
chosen location that contained only noise. On the other hand, when the feedback
indicates that the localization decision was correct, then the observer updates
the priors based on the image data presented at the chosen location.
The prior update rule for this model is described as
follows:
For correct localization trials:
 | (11) |
For incorrect localization
trials:
 | (12) |
In essence, this is a mixture model. On the proportion,
p, of the trials in
which the decision was correct on the
tth
trial, the model learns optimally on the
t+1th
trial, whereas on the proportion,
1-p,
of the trials in which the decision was incorrect, there is no prior updating,
and, therefore, no learning on the
t+1th
trial. The progression of the distribution of priors for this model can be
compared to that of the optimal Bayesian observers using the relative entropy
measure. Figure 8 shows the relative entropy of
the “prior update on correct trials only” increases more slowly than
the optimal
observer.
Figure 8. Relative entropy as a function
of learning trial for (a) the optimal Bayesian observer (empty triangles); (b)
“prior update based on correct trial only” model (empty squares) ;
(c) “prior update based on chosen location” model
(x) ; and (d) linear prior update model
(circles).
Linear
prior update model
The priors of the optimal observer are a nonlinear
function (the likelihood) of the linear response of a sensor (or template)
matched to the possible targets. The present suboptimal model uses the linear
response of the sensor (template) to the data rather than the likelihood to
compute the priors:
 | (13) |
where
rj,t
is the linear response of the sensor matched to the
jth
signal,
sj
is a vector containing the elements of the
jth
signal,
gsp,t
is a vector containing the data at the signal present location for the
tth
learning trial, and
(b) is a constant
added to avoid negative priors.
On each trial the priors are updated by multiplying by
the responses to the signal present location:
 | (14) |
The algorithm is suboptimal and will, therefore, result
in less learning than that of the optimal observer. Figure 8 shows that the relative entropy for the
linear prior update model increases slower as a function of trial number than
both the “update priors in correct trials only” model and the
optimal observer.
Psychophysical experiments
The signals were 2D elongated Gaussians (major axis
SD =
0.301°, 8 pixels, and minor axis
SD =
0.075°, 2 pixels) with one of four orientations: 0°, 45°,
90°, and 135° with two polarities: (1) positive for the 0° and
90° orientations; and (2) negative for the 45° and 135°
orientations (see Figure 1).
Noise was spatially uncorrelated (white) Gaussian noise
with a SD of 4.9 cd/m2 (25
gray levels of the linearized luminance scale).
The signals were randomly located at one of eight
locations equidistant along a circle with radius 3.384°. Possible signal
locations were surrounded by black boxes subtending an angle of
1.805°. The mean display luminance was 25 cd/m2 and
was calibrated to result in a linear relationship between digital gray level and
luminance. Experiment images were displayed on an Image Systems M17LMAX
monochrome monitor with maximum resolution of 1664 x 1280 pixels (Image Systems,
Minnetonka, MN).
Three naïve observers participated in the study
(two females, one male, aged 20-23 years with normal or corrected acuity).
Viewing distance was 50 cm.
The observer initiated each trial by pressing the left
button of the computer mouse. On each trial, the test image containing the
signal plus noise was briefly presented for 200 ms. A response image followed
containing the black boxes but no signal nor external noise. Observers chose a
target location by placing the cursor inside a box and pressing the left button
of the computer mouse. Feedback about the location of the target was provided
using a red circle that appeared inside the box that had contained the target.
At the end of the 4th trial, the observer was asked to make an
identification decision by placing the mouse cursor on top of one of the four
high-contrast copies of the possible signals that were shown on the top of the
screen.
Each observer participated in 12 sessions of 100
learning blocks, resulting in a total of 4,800 trials. Proportion correct signal
localization was calculated for each observer and learning trial (averaged
across the 1,200 learning blocks). Proportion correct identification of the
signal in the 4th learning trials was also calculated.
Because of the nonlinear nature of the decision rules
by the optimal Bayesian observer and suboptimal models, all model performances
were calculated using Monte-Carlo simulations based on 20,000 trials per data
point. The same signals and external noise values were used in the
psychophysical experiment and the simulations. For each trial, the decision
about localization was made using the rules described in Equations 1 through 5. For the suboptimal models, the prior updates
were based on Equations 10 through 14.
Figure 9 shows human
localization performance (proportion correct,
Pc) as a function of trial number for
three naïve observers. Although absolute performance was significantly
different across observers, all three observers showed similar improvements with
learning trials. Average improvement in
Pc from the
1 st-to-4 th learning trial was 6.5% for KC, 6.2% for AB,
and 7.5% for LL. All improvements were statistically significant
( p
< .01). For all observers, the largest improvement occurred between
the 1 st and 2 nd learning trials ( Figure
9).
Figure 9. Proportion correct localizing
the target in one of eight locations as a function of learning trial number for
three observers (KC, AB, and LL). Last data points (ID on
x-axis) correspond to performance in
the target identification task for the
4th learning trial.
Figure 10 shows
efficiency as a function of learning trial for the three observers. The
efficiency decreased from the 1 st-to-4 th learning trial by
5.34% for KC, 4.36% for AB, and 4.34% for LL. Measured as a percentage of the
efficiency in the 1 st learning trial, the decreases represent a 20.3%
(KC), 27.9% (AB), and 21.7% (LL) reduction. Patterns of efficiency as a function
of trial number were similar across all three
observers.
Figure 10. Efficiency for three observers
as a function of learning trial number (KC, AB, and LL). Dotted line is the
efficiency of a hypothetical observer that performed at AB’s
1st learning trial
performance and did not learn across trials.
Comparison
to suboptimal models
Figure 11a and 11b compare the overall learning in humans to
that of a number of suboptimal models (the ideal observer is shown on Figure 11a for comparison). Figure 11a compares human learning to
“update priors on correct trials only” model. The contrasts of the
signals were adjusted for the models so that their localization performance
matched that of human on the 1 st trial. Learning for the “prior
update on correct trials only” model was larger than human learning. Figure 11b compares human performance to two
other suboptimal models: linear prior update model and “prior update based
on chosen location” model. Results show that the linear prior update
model resulted in learning comparable to that of human; however, learning on the
2 nd trial seems to be lower than human, whereas learning on the
4 th trial seems to be consistently larger than human. On the other
hand, the “prior update based on chosen location” model resulted in
virtually no learning.
Figure 11. a. Proportion correct as a
function of trial number for three human observers compared to that of an ideal
observer (dashed lines) and the “update priors in correct trials
only” model (dotted lines). b. Proportion correct as a function of trial
number for three human observers compared to that of a linear prior update model
(dashed lines) and the “update priors based on the chosen location”
model (dotted lines). Model performances were calculated for signal contrasts
that led to 1st trial
performance that matched each human.
Learning contingent on correctness of the 1st learning trial
Figure 12a shows human
localization performance (proportion correct) as a function of trial number for
2 nd, 3 rd, and 4 th learning trials for those
trials in which localization on the 1 st trial was correct (continuous
lines) versus those in which the localization on the 1 st trial was
incorrect (dashed lines). For all three observers, performance improvement
across learning trials was significantly larger for trials in which the
observers correctly localized the signal on the 1 st trial. Figure 12a- 12d
show localization performance for correct and incorrect localizations on the
1 st trial for the optimal Bayesian observer, the “prior update
on correct trials only” model, and the linear prior update model. Both
the optimal Bayesian and linear prior update models also showed sequential
effects, but the effects were smaller than those in humans. The sequential
effects for the “prior update on correct trials only” model were
more comparable to the lack of human learning on 2 nd trials,
following incorrect 1 st trial localizations ( Figure 12a vs. 12c).
Figure 12. a,b,c, and d. Proportion
correct localization for
2nd ,
3rd, and
4th learning trials
following correct (continuous lines) and incorrect
1st localization trials
(dotted lines) for (a) human observers (top graph), (b) Bayesian ideal observer,
(c) a model that only updates the priors in correct localization trials, and (d)
a model that updates priors linearly. Signal-to-noise ratios (signal contrasts)
were adjusted to achieve an overall proportion correct in the
1st learning trial that
matched each individual human observer.
Signal
identification performance on the 4th learning trial
Figure 10 shows
proportion correct identifying the signal on the 4 th learning trial
for all three observers (plotted above the
x-axis label: ID): 0.959 (KC), 0.848
(AB), and 0.83 (LL). The identification efficiencies were 11.23% (KC), 4.75%
(AB), and 4.28%
(LL).
The importance of comparing human learning performance to the optimal Bayesian learner
Our results ( Figure 9)
show that humans are able to quickly improve in their localization performance
within a few trials (4 trials; < 1 min). Also, human performance increases
fast between the 1 st and 2 nd learning trials and slower
after the 2 nd trial. This early fast learning followed by reduced
late learning might be interpreted to reflect two learning algorithms or
different learning-dependent neurophysiological events evolving within different
time frames (Atienza, Cantero, & Dominguez-Marin, 2002). However, comparison to the optimal
Bayesian learner suggests otherwise. The larger amount of learning from the
1 st-to-2 nd learning trial is also present in the optimal
observer ( Figure 11), suggesting that this
effect is not particular of the human neural learning algorithm but might be a
property inherent to the task and stimuli. Furthermore, the efficiency analysis
( Figure 10) shows that for all three human
observers the largest drop in efficiency occurred between the 1 st
trial and the 2 nd learning trials. This suggests that even though
humans learned the most between the 1 st and 2 nd trials,
they improved only a fraction of what the optimal observer does.
Sources of suboptimal human learning
One possible source of inefficiency in the human
learning is imperfect visual memory (Luck & Vogel, 1997). The ideal observer perfectly remembers the
image presented at the location indicated by the feedback, and in updating the
priors is limited only by the external noise on the image. In contrast, humans
are probably updating their priors based on a lower quality memory
representation of the image. This lower quality representation can be modeled by
adding “memory noise” into the image at the feedback location. This
would result in more unreliable prior updating and would, therefore, lead to
inefficient learning.
One striking result of our analysis of localization
performance is that human observers failed to learn at all on 2 nd
trials following incorrect 1 st trials ( Figure 10a). This result suggests that observers
were unable to use the location feedback following incorrect localization
decisions to update the signal priors. This outcome might be due to
observers’ inability to remember the image presented at a missed target
location.
On the other hand, the results also support the idea
that observers used the feedback informing them that they had chosen an
incorrect location to leave the priors unchanged for the next trial following
incorrect localization trials. If humans had ignored the feedback altogether and
updated their priors based on image data from an incorrectly chosen location
that only contained noise, then they would not show any learning at all (as
predicted by the “update priors based on chosen location” model, Figure
11b). Sequential effects in perceptual judgments for human and optimal observers when learning about signals
It has been previously observed that in many basic
experiments, a human observer’s response on a given trial is influenced to
some extent by the stimuli and responses on immediately proceeding trials (Green
& Swets, 1966; Green, 1964; Atkinson, Carterette, & Kinchla, 1962; Kinchla, 1964). This observation violates one of the
fundamental assumptions of the typical signal detection theory analysis and
optimal decision making. These dependencies have typically been related to
fluctuations in alertness (Mackeig & Inlow, 1993) and observer biases (Kinchla, 1964).
One interesting result arising from the present work is
its relation to sequential dependencies in the beginning trials of a
psychophysical study: the fact that the probability of a correct trial is larger
if the previous trial was correct than when it was incorrect.
These dependencies could be explained if we note that
often when first faced with a new perceptual task, the observer is uncertain
about the signal he/she is looking for and by assuming that observers are
updating their priors as trials progress.
Our present theoretical results for the present task
show that even an optimal observer with uncertainty about target parameters and
learning about the target from trial to trial via prior updating will give rise
to sequential effects (see Figure 12b). This
result might seem counterintuitive given the statistical independence of the
decision on each trial. However, the sequential effects in the optimal Bayesian
observer are explained by the fact that the prior updating of a given trial
depends on the information presented on earlier trials, breaking the statistical
independence of the decisions on each trial. For example, for the present OPL
paradigm, first trials that led to a correct localization are typically
associated with more evidence (higher likelihoods) about the relevant target
than incorrect localization trials. Therefore, the prior corresponding to the
relevant target will be larger for 2 nd trials in which localization
was correct on the 1 st trial. Figures
13a and 13b show the prior updating for
correct and incorrect 1 st trial localization trials for an optimal
Bayesian observer for learning blocks with signal 1 as the target. Note that
for the 2 nd learning trial, priors for signal 2 and 4 are close to
zero following 1 st trials with correct localization ( Figure 13a), whereas they are non-zero following
1 st trials with incorrect localization ( Figure 13b). In addition, the weighting of the
relevant signal 1 is 0.8 following 1 st trials with correct
localization and is 0.6 following 1 st trials with incorrect
localization. This higher weighting of the relevant target will lead to higher
performance localizing the target on 2 nd trials following correct
1 st trials ( Figure 12a).
Figure 13. a,b,and c. Average progression
of priors for each signal as a function of learning trial in learning blocks in
which the signal 1 was present for (a) correct
1st localization trials
for the Bayesian Ideal observer and the “update in correct trials
only” model, (b) incorrect
1st localization trials
for the optimal observer, and (c) incorrect
1st localization trials
for the “update in correct trials only” model.
However, note that our results show that for the three
human observers ( Figure 12b), the dependency of
performance of the 2 nd trial on whether the 1 st learning
trial was correct is much larger than that of the optimal observer and also than
the linear prior update model ( Figure
12d).
The human result is more comparable to that of a model
that does not update priors on incorrect trials ( Figure 12c). Figure
13c shows the development of the priors on incorrect trials for such a
model. The priors on the 2 nd trial following incorrect 1 st
trials are unchanged leading to no performance improvement from the
1 st-to-2 nd trial ( Figure
13c).
Relationship to other perceptual learning paradigms using external noise
There is a growing literature assessing mechanisms of
perceptual learning across blocks of hundreds of trials using external noise
(Gold et al., 1999;
Dosher & Lu, 1998; Li, Levi, & Klein, 2004; Lu & Dosher, 2004). Possible mechanisms of improvement of
performance include (a) better tuning of the perceptual template; (b) reduction
in internal noise; and (c) change in nonlinear properties such as transducer
and/or intrinsic uncertainty.
Most of these studies have shown that template retuning
is responsible for the perceptual learning, perhaps related to the
observers’ increasing ability to use full knowledge about the visual
properties of the signal as the trials progress. In the present study, we
explicitly manipulated the uncertainty about the signal. We did not consider the
possibility that the learning across trials is due to a reduction in constant
additive internal noise. In the context of the OPL experimental paradigm,
learning due to a reduction in internal noise would require that internal noise
cyclically decreased from the 1st-to-4th learning trials
and with the unlikely scenario that internal noise reset itself to a high level
for the 1st learning trial of the next learning block.
Instead, the present work considered models that
improved their ability to integrate information across possible signals
nonlinearly akin to the optimal Bayesian model. However, it might be that a
suboptimal fully linear model that integrated information across possible
signals in a linear way might be able to account for the data. Future work will
attempt to discriminate between a fully linear model and the multiple templates
nonlinear models investigated in the present work.
We have introduced a new experimental paradigm to
systematically and quantitatively study the dynamics of perceptual learning by
comparing human learning to that of an optimal Bayesian learner. The paradigm
provides a general and flexible framework that could be used to study the
process of learning in a variety of tasks and sensory modalities. Our results in
the context of localization of a target with uncertainty about orientation and
polarity show that humans can rapidly learn (within 4 trials), although less
than an optimal observer (average percentage drop in efficiency from
1st-to-4th trial = 23.3%). The largest improvement in
human performance, occurring from 1st-to-2nd trial,
reflects a property inherent to the visual task and not a property particular to
the human perceptual learning mechanism. One important difference between the
human and ideal observer is that human learning relies (suboptimally) more
heavily on previous decisions than on the feedback, resulting in no human
learning on trials following an incorrect localization
decision.
This research was supported by National Institutes of
Health Grant 53455, National Aeronautics and Space Administration Grant 1157,
and National Science Foundation Grant 0135118. We thank Katherine Chong, Lloren
Llena, and Alexander Block for participating as observers in the study. Portions
of this work were previously presented at the Vision Sciences Society Meeting
(Abbey, Eckstein, & Shimozaki, 2001;
Eckstein, Abbey, & Shimozaki, 2002). Finally, we thank anonymous
reviewer 1 for suggesting the use of the relative entropy metric.
Commercial relationships: none.
Corresponding author: Miguel P. Eckstein.
Email: eckstein@psych.ucsb.edu.
Address: Vision & Image Understanding Lab, Department of Psychology, UC Santa Barbara, Santa Barbara, CA, 93106-9660 USA.
1Note that uncertainty
in this context is used to refer to the general idea of lack of full knowledge
about the visual properties of the signal being presented and not to a
particular nonlinear decision rule to integrate information across possible
signals, such as in previous work (Pelli, 1985;
Eckstein, Ahumada, & Watson, 1997).
2Previous studies have
compared human performance with respect to an ideal observer in a standard task
in which an ideal observer does not learn (Gold et al., 1999). Thus, these studies allow for the
identification of the mechanism mediating the learning; they do not allow for
comparisons of the amount of learning in humans and in an optimal
learner.
Abbey, C. K., Eckstein, M. P.,
& Shimozaki, S. S.(2001). The efficiency of perceptual learning in a visual
detection task [ Abstract].
Journal of Vision,
1(3), 28a, http://journalofvision.org/1/3/28/, doi:10.1167/1.3.28.
Atienza, M., Cantero, J. L.,
& Dominguez-Marin, E. (2002). The time course of neural changes underlying
auditory perceptual learning. Learning and
Memory, 9, 138-150. [ PubMed]
Atkinson, R. C., Carterette, E.
C., & Kinchla, R. A. (1962). Sequential phenomena in psychophysical
judgments: A theoretical analysis. Institute
of Radio Engineers Transactions on Information Theory,
IT8, S155-162
Ball, K., & Sekuler, R. (1987).
Direction-specific improvement in motion discrimination.
Vision Research,
27, 953-965. [ PubMed]
Barlow, H. B. (1980). The
absolute efficiency of perceptual decisions.
Philosophical Transactions of the Royal
Society of London B, 290, 71-91. [ PubMed]
Beard, B. L., & Ahumada, A.
J., Jr.(1999). Detection in fixed and random noise in foveal and parafoveal
vision explained by template learning. Journal
of the Optical Society of America A,
16, 755-763. [ PubMed]
Beard, B. L., Levi, D. M., &
Reich, L. N. (1995). Perceptual-learning in parafoveal vision.
Vision Research, 35, 1679-1690. [ PubMed]
Burgess, A. E., Wagner, R. F.,
Jennings, R. J., & Barlow, H. B. (1981). Efficiency of human visual signal
discrimination. Science,
2, 93-94. [ PubMed]
Crist, R. E., Kapadia, M. K.,
Westheimer, G., & Gilbert, C. D. (1997). Perceptual learning of spatial
localization: Specificity for orientation, position, and context.
Journal of Neurophysiology,
78(6), 2889-2894. [ PubMed]
Dosher, B. A., & Lu, Z. L.
(1998). Perceptual learning reflects external noise filtering and internal noise
reduction through channel reweighting.
Proceedings of the National Academy of
Sciences of the U.S.A., 95(23),
13988-13993. [ PubMed][ Article]
Eckstein, M. P., Shimozaki,
S. S., & Abbey, C. K. (2002). The footprints of visual attention in the
Posner cueing paradigm revealed by classification images.
Journal of Vision,
2(1), 25-45,
http://journalofvision.org/2/1/3/,
doi:10.1167/2.1.3 .
[ PubMed][ Article]
Eckstein, M. P, Abbey, C.
K., & Shimozaki, S. S. (2002). Short term negative learning produced by
monitoring erroneous templates [ Abstract].
Journal of Vision,
2(7), 560a,
http://journalofvision.org/2/7/560/,
doi:10.1167/2.7.560 .
Eckstein, M. P., Ahumada, A.,
Jr., & Watson, A. B. (1997). Visual signal detection in structured
backgrounds. II. Effect of contrast gain control, background variations and
white noise. Journal of the Optical Society of
America A, 14, 2406-2419. [ PubMed]
Fahle, M.,Edelman, S.,
& Poggio, T. (1995). Fast
perceptual-learning in hyperacuity. Vision
Research, 35(21), 3003-3013. [ PubMed]
Fine, I., & Jacobs, R. A.
(2002). Comparing perceptual learning tasks: A review.
Journal of Vision,
2(2), 190-203,
http://journalofvision.org/2/2/5/, doi:10.1167/2.2.5. [ PubMed][ Article]
Geisler, W. S. (2003). Ideal
Observer analysis. In L. Chalupa & J. Werner (Eds),
The visual neurociences. Boston: MIT
press.
Gibson, E. J. (1969).
Principles of perceptual learning and
development. Englewood Cliffs, NJ: Prentice-Hall.
Gibson, E. J. (2000).
An ecological approach to perceptual learning
and development. Oxford, NY: Oxford University Press.
Gold, J., Bennett, P. J., &
Sekuler, A. B.(1999). Signal but not noise changes with perceptual learning.
Nature,
402(6758), 176-178. [ PubMed]
Gold, J. M. (2003). Dynamic
classification images reveal the effects of perceptual learning in a hyperacuity
task [ Abstract].
Journal of Vision, 3(9), 162a,
http://journalofvision.org/3/9/162/,
doi:10.1167/3.9.162 .
Goldstone, R. L. (1998).
Perceptual learning. Annual Review of
Psychology, 49, 585-612. [ PubMed]
Green, D. M. (1964). Consistency
of auditory detection judgments. Psychological
Review, 71(5), 392-407. [ PubMed]
Green, D.
M., & Swets, J. A. (1966). Signal
detection theory and Psychophysics. New York: Wiley.
Hurlbert, A. (2000). Visual
perception: Learning to see through noise.
Current Biology,
10, R231-R233. [ PubMed]
Karni, A., & Sagi, D. (1993).
The time course of learning a visual skill.
Nature,
16, 250-252. [ PubMed]
Kersten, D., Mamassian, P.
& Yuille, A. (2004). Object perception as Bayesian inference.
Annual Reviews of Psychology, 55,
271-304. [ PubMed]
Kersten, D., & Yuille, A.
(2003). Bayesian models of object perception.
Current Opinion in Neurobiology,
13(2). [ PubMed]
Kinchla, R. A. (1964). A
learning factor in visual discrimination. In R. C. Atkinson (Ed.),
Studies in mathematical psychology.
Palo Alto: Stanford University Press.
Kinchla, R. A., Chen, Z.,
& Evert, D. (1995). Precue effects in visual search: Data or resource
limited? Perception & Psychophysics,
57, 441-450. [ PubMed]
Kullback, S. (1959),
Information theory and statistics. New York: Wiley.
Li, R. W., Levi, D. M., & Klein,
S. A. (2004). Perceptual learning improves efficiency by re-tuning the decision
'template' for position discrimination. Nature
Neuroscience, 7(2), 178-183. [ PubMed]
Liu, Z., Kersten, D., & Knill,
D. C. (1999). Dissociating stimulus information from internal representation-a
case study in object recognition. Vision
Research, 39, 603-612. [ PubMed]
Liu, Z., Knill, D. C., &
Kersten, D. (1995). Object classification for human and ideal observers.
Vision Research,
35, 549-568. [ PubMed]
Lu, Z. L., & Dosher, B. A.
(2004). Perceptual learning retunes the perceptual template in foveal
orientation identification. Journal of
Vision, 4(1), 44-56,
http://journalofvision.org/4/1/5/, doi:10.1167/4.1.5. [ PubMed][ Article]
Luck, S. J., & Vogel, E. K.
(1997). The capacity of visual working memory for features and conjunctions,
Nature, 390, 279-281. [ PubMed]
Makeig, S., & Inlow, M.
(1993). Lapses in alertness: Coherence of fluctuation in performance and EEG
spectrum. Electroencephalography and Clinical
Neurophysiology, 86,23-35. [ PubMed]
Matthews, Liu, Z., Geesaman, B.
J., & Qian, N. (1999). Perceptual learning on orientation and direction
discrimination. Vision Research, 39,
3692-3701. [ PubMed]
McKee, S. P., & Westheimer,
G.(1978). Improvement in vernier acuity with practice.
Perception & Psychophysics,
24, 258-262. [ PubMed]
Murray, R. F., Sekuler, A. B., & Bennett, P. J.
(2003). A linear cue combination framework for under-standing selective
attention. Journal of
Vision, 3(2), 116-145,
http://journalofvision.org/3/2/2/, doi:10.1167/3.2.2. [ PubMed][ Article]
Pelli, D. G. (1985). Uncertainty
explains many aspects of visual contrast detection and discrimination.
Journal of the Optical Society of America
A, 2, 1508-1532. [ PubMed]
Peterson, W. W., Birdsall, T.
G., & Fox, W. C. (1954). The theory of signal detectability.
Transaction of the IRE P.G.I.T.,
4, 171-212.
Shimozaki, S. S., Eckstein, M.
P., & Abbey, C. K. (2003). Comparison of two weighted integration models for
the cueing task: Linear and likelihood.
Journal of Vision,
3(3), 209-229,
http://journalofvision.org/3/3/3/, doi:10.1167/3.3.3. [ PubMed][ Article]
Sur, M.,
Schummers, J., & Dragoi, V. (2002).
Cortical plasticity: Time for a change.
Current Biology,
12(5), R168-R170. [ PubMed]
Tjan, B. S., Braje, W. L., Legge
G. E., & Kersten, D. (1995). Human efficiency for recognizing 3-D objects in
luminance noise. Vision Research,
35, 3053-3069. [ PubMed]
Tjan, B. S., & Legge, G. E.
(1999). The viewpoint complexity of an object-recognition task.
Vision
Research, 38, 2335-2350. [ PubMed]
|