 |
| Volume 4, Number 2, Article 2, Pages 82-91 |
doi:10.1167/4.2.2 |
http://journalofvision.org/4/2/2/ |
ISSN 1534-7362 |
Estimation of nonlinear psychophysical kernels
Peter Neri |
Department of Zoology, University of Cambridge, England, UK |
|
Abstract
Reverse correlation techniques have been extensively used
in physiology (Marmarelis & Marmarelis 1978; Sakai, Naka, & Korenberg, 1988), allowing characterization of both
linear and nonlinear aspects of neuronal processing (e.g., Emerson, Bergen,
& Adelson, 1992; Emerson & Citron
1992). Over the past decades, Ahumada
( 1996) developed a psychophysical reverse
correlation technique, termed noise image classification (NIC), for deriving the
linear properties of sensory filters in the context of audition first (Ahumada,
1967; Ahumada, Marken, & Sandusky, 1975), and then vision (Ahumada, 1996, 2002; Beard & Ahumada, 1998). This work explores ways of
characterizing nonlinear aspects of psychophysical filters. One approach
consists of an extension of the NIC technique (ExtNIC), whereby second-order
(rather than just first-order) statistics in the classified noise are used to
derive sensory kernels. It is shown that under some conditions, this procedure
yields a good estimate of second-order kernels. A second, different approach is
also considered. This method uses functional minimization (fMin) to generate
kernels that best simulate psychophysical responses for a given set of stimuli.
Advantages and disadvantages of the two approaches are discussed. A mathematical
appendix shows some interesting facts: (1) that nonlinearities affect the linear
estimate (particularly target-present averages) obtained from the NIC method,
providing a rationale for some related observations made by Ahumada ( 1967); (2) that for a linear filter followed
by a static nonlinearity (LN system), the ExtNIC estimate of the second-order
nonlinear kernel is correctly null, provided the criterion is unbiased; (3) that
for a biased criterion, such an estimate may contain predictable modulations
related to the linear filter; and (4) that under certain assumptions and
conditions, ExtNIC does return a correct estimate for the second-order nonlinear
kernel.
 |
|
History
Received July 1, 2001; published February 26, 2004
Citation
Neri, P. (2004). Estimation of nonlinear psychophysical kernels.
Journal of Vision, 4(2):2, 82-91,
http://journalofvision.org/4/2/2/,
doi:10.1167/4.2.2.
Keywords
reverse correlation, triggered correlation, recursive least squares, linear-nonlinear model, system identification
for related articles by these authors
for papers that cite this paper |
For the
purpose of this work, psychophysical observers are conceived of as functionals
that map an input function
I,
such as a visual stimulus, into a binary
number:  | (1) |
where
o
is 0 for “no” and 1 for “yes,”
I
is an input function defined over an
n-dimensional
space (e.g.,
x,y,t),
F
is a functional mapping
I
into a real number, and
thrc(x)
is a function that returns 0 when
x
<
c,
and 1 when
x
≥
c.
This work considers only yes/no tasks (a brief explanation of how the
considerations made here can be extended to two-alternative forced-choice [2AFC]
tasks is given in the next section). To simplify notation,
I
is defined over one dimension only
(x). The
functional
F
will, in general, be nonlinear. One way of representing a nonlinear system such
as this one is by Volterra expansion,
where:  | (2) |
The objects
Ln
are the system’s kernels, and
Vn
are outcomes of filtering stimulus
I
with these kernels.1
Volterra kernels are intrinsically symmetric. Volterra showed that this
representation applies to nonlinear systems that are time-invariant, with finite
memory, and analytic (Volterra, 1959).
For a
linear system (i.e.,
F
=
V1),
Ahumada's noise image classification (NIC) technique (Beard & Ahumada,
1998)
returns the linear kernel
L1
up to a multiplicative factor (Ahumada, 2002,
and “Appendix”).
When the
system is nonlinear, one needs to determine higher-order kernels
(L2,
L3,...,
Ln)
to characterize it. This work will focus only on second-order kernels, but can
be easily extended to higher orders.
The
second-order Volterra kernel
L2
dictates how pairs of input pulses interact in the system, that is whether pairs
of inputs need to covary positively or negatively (or not at all) in order to
drive a positive response from the system. The natural extension of the NIC
technique to deriving this nonlinear kernel is to compute, instead of the
first-order statistics associated with the classified noise, its second-order
statistics (e.g., Marmarelis & Marmarelis, 1978),
in terms of second-order moments or covariance matrices. In Section 2 and in the
“Appendix,” it is shown that this method does return a good estimate
of psychophysical second-order kernels.
A different
approach to nonlinear system characterization involves functional minimization
(fMin).
Section 3 describes in more detail how this method works. In short, this
approach allows determination of the system's kernels by minimizing, for a given
input sequence
Ii
(where
i
refers to the
ith
trial), the difference between the experimentally determined output sequence
oiexp
and one
(oi)
computed using an equation similar to (1). The minimization is first carried out
for a linear system, determining the linear kernel
W1
that accounts for most of the output. The system in Equation
1 is then upgraded to
second-order, and the best second-order kernel
W2
is similarly determined. In general (i.e., when input is arbitrary), every time
the system is upgraded to a higher order, one has to correct for the fit
deriving from lower orders (this is explained in Section 3).
One
advantage of the
fMin
approach is that it has no specific requirements on input characteristics. The
ExtNIC method works only with specific types of input (e.g., Gaussian white
noise; see “Appendix” for details). However, fMin involves a
minimization step that can be prohibitive in most practical applications,
whereas ExtNIC is, under certain conditions, a simple and robust
estimator.
2. Derivation of second-order kernels by computing second-order statistics (ExtNIC)
For the
purpose of this section, all the investigator needs from a psychophysical
experiment is four sets of noise images associated with the four
stimulus-response classes (hits, false alarms, misses, and correct rejections).
The type of noise used has to satisfy certain requirements (e.g.,
orthogonality). These requirements are listed in the “Appendix;”
Gaussian white noise, for example, satisfies them all.
Ii[s,o]
is the
ith
stimulus image associated with stimulus
s
(where
s
= 0 is noise-only,
s
= 1 is target+noise) and
response
o.
Ahumada's estimate L1est of the system's linear kernel L1
is (Ahumada, 1996):  | (3) |
where
L1[s,o]
=
E(I[s,o]
(v))
(E(x)
being the expectation of
x
across trials). The “Appendix” provides a formal derivation of how
this estimate (target-present averages in particular) is affected by a
second-order kernel
L2.
Similarly,
what is proposed here is that one can estimate the system's second-order
Volterra kernel,
L2,
by
computing:  | (4) |
or,
similarly,  | (5) |
where . |
One of the
main results in the “Appendix” (Equation
17) is that for a linear system
followed by a static nonlinearity
(LN
system) with odd-symmetric decisional transducer at unbiased criterion,
L2[s,o]
is correctly null.
There are a
few reasons for using covariance (Equation
5) rather than second-order
moments (Equation
4); one reason is that for an
LN system, L1
can create spurious modulations in
L2est
when the criterion is biased, and these modulations are of the type
L1(ν )· L1(ξ )
(see “Appendix”). Covariance partly compensates for these effects by
subtracting a similar term (this relates to the difference between solid squares
and dashed line in the top right panel of Figure
3b; see Section 4).
Another
reason is that for an LN system, filtering always reduces variance compared to
baseline noise variance; this happens because the noise distribution is
truncated at locations where the filter is applied (Ahumada, 2002).
Inspection of the
L2[s,o]
images for the
different stimulus-response classes is, therefore, informative as to whether the
final modulations observed in
L2est
are possibly due to
LN
filtering or not. For example, a positive modulation (with respect to baseline
noise variance) along the diagonal in an individual response class (in
cov[s,o])
cannot be due to
LN
filtering, and would require further investigation.
Other
important diagnostic tools involve identifying sizeable covariance modulations
in
L2est
that are not localized at the level of
L1(ν)·L1(ξ),
as those too cannot be due to repercussions from
LN
filtering. In general, it is important that the experimenter
examines
L2[s,o]’s
and
L1[s,o]’s
carefully and decides whether, in the specific context of the experiment being
carried out, these are informative.
Figure
1 provides an example that illustrates the outcome of this procedure. A psychophysical experiment is simulated for a system with known, randomly generated Volterra kernels (shown in a and d); this system maps input noise images into binary responses according to Equation 1 for a fixed, unbiased c. Noise image intensities were uniformly distributed around 0 (spanning –1 to 1), the size of images was 7, and the
target was the vector [0 0 0 1 0 0 0] (added on half trials). A total of 5,000
trials were run.
Figure 1. Outcome of a simulation
involving kernel derivation as described in Section 2. a and d are two randomly
generated kernels (linear and second-order nonlinear, respectively); b and e are
the corresponding estimates. c and f show correlations between real and
estimate: c plots real values (those in a) versus estimated values (those in b)
for the linear kernel; f plots real values (those in d) versus estimated values
(those in e) for the second-order nonlinear kernel.
In b, the
function computed using Equation
3 is shown, and it can be seen
that it returns the linear kernel quite well (it would be optimal if the system
were linear). In e, the function obtained by computing covariance (Equation
5) is plotted, and it can be
seen that this also provides a fairly good estimate of the second-order kernel
in d. c and f plot real versus estimated values for the two kernels; in this
example, correlation values were 0.93 and 0.96 (for first- and second-order
kernels).
As
mentioned in the “Introduction,” this work focuses on yes/no tasks.
The most trivial extension of these methods to 2AFC tasks is, for example, to
classify each noise image in each interval separately, taking the response from
the observer as a double statement on both intervals (e.g., if the target is in
interval 1 and the observer responds “interval 2,” this statement is
taken as “interval 2, not interval 1,” and the noise image in
interval 1 is classified as a miss, the noise image in interval 2 as a false
alarm). This approach may be problematic in some experimental contexts, as it
relies on the assumption that Equation
1 can be applied separately to
each interval in a 2AFC task. For more details on how the NIC approach extends
to AFC tasks, the reader should refer to Abbey and Eckstein (2002).
3. Derivation of linear and nonlinear kernels using functional minimization (fMin)
For this
section, what we need from a psychophysical experiment is a full description of
input and output, that is, a sequence of input images (including the target) and
a sequence of binary responses.
There are
no special requirements on the input, as long as it can be adequately described
for use in Equation
1.
Ii
is the
ith
input image, and
oiexp
the
ith
response from the observer (0 for “no,” 1 for “yes”).
The
fMin
method works by computing the best linear kernel first, using a Volterra
representation of the system that extends only up to the linear
order:  | (6) |
The
estimated first-order kernel for the system,
W1,
is the
L1
that minimizes the difference between
oiexp
and
oi:  | (7) |
where
minNoTrialsPerBlock
is the best minimization
obtained by allowing criterion
c
in Equation
6 to vary every
NoTrialsPerBlock
number of trials, and
D(xi,yi)
for
i
ranging from 1 to
n
(and
x
and
y binary) is , where (xi ≠ yi)
returns 1 if
xi
≠
yi
and 0 if
xi = yi (the choice of criterion variation and distance measure D
offered here are not the only ones that are possible; e.g., one may allow
c
to vary according to a prespecified probability distribution without optimizing
its value every
NoTrialsPerBlock
number of trials, and
D
could be chosen to be the sum of square differences).
The extent
to which
oi
matches
oiexp
will be affected by the choice of
c
in Equation
6. It seems reasonable that
c
should be allowed to vary, as is the case in the psychophysics (over a large
number of trials, it is expected that observers' criteria fluctuate a bit). How
often this should happen will depend on the specifics of the experiment.
c
should not be allowed to vary
too often: in the extreme case of
NoTrialsPerBlock
=
1, minimizing Equation
6 becomes meaningless, as on
each trial
i
there will always be a value for
c
that returns
oi
=
oiexp.
A reasonable value for
NoTrialsPerBlock
could be, for example, 200. This means that the minimization in Equation
7 would be carried out computing
the threshold step in Equation
6 for an optimal choice of
c
every 200 trials.
The next
step involves a similar procedure, applied to the second-order nonlinearity. The
system is now represented as
follows:
The best
approximating linear kernel
W1
has already been computed; what needs to be determined are an adjusting linear
kernel
W1adj,
and a second-order kernel
W2
(Victor, 1992):
these are obtained adopting the same minimization procedure as before. The best
approximating second-order kernel
W2
is the second-order kernel
L2
that minimizes the difference between
oiexp
and
oi: . |
In the
process of minimizing for
L2,
one also has to minimize for
L1adj,
obtaining both
W2
and
W1adj.
The best approximating first-order kernel, however, is still the one computed
before,
W1,
not the correcting one
(W1adj)
introduced here (see below for an explanation).
The best
way to understand this formulation is to think of it in relation to polynomial
approximation. The problem analyzed in this section is similar to an attempt to
approximate an unknown function
y
=
f(x)
within a specified interval
[a,b]
using a polynomial expansion of
x,
; say
f(x) =
x2,
and the interval [0,1]. If we approximate this with a zeroth-order expansion
f
*=
a0,
the best
a0
(in the least mean square difference sense) is
w0
=
1/3. At the first-order,
f*(x)
= a0 +
a1x,
we have that the best approximation is
for
a0 = 1/6, and
a1 =
1. We write this as
f*(x)
= w0 +
w0,1adj
+
w1x,
with
w0,1adj
=
–1/2 being the
first-order-related correction for the zeroth order term (so that
w0 +
w0,1adj
=
–1/6), and
w1 =
1. In other words, we need to
adjust our best estimate for the zeroth order term as we introduce a linear
term. At the second order, we can achieve perfect fit using
f*(x)
=
x2,
so we need to cancel out both zeroth- and first-order terms:
f*(x)
= w0 +
w0,1adj+
w0,2adj
+
(w1 +
w1,2adj)
+
x +0 w2x2,
with
w0,2adj
=
1/6,
w1,2adj
=
-w1,
and
w2 =
1. From here, the reader can
see that
wn
is the best approximating coefficient of order
n
for the
nth
order approximation. This is also what the best approximating kernels
Wn's
are. The reader is referred to Victor,1992,
where this method is presented as an abstract formulation of the Wiener approach
to nonlinear system representation.
If the
interval
[a,b]
is symmetric (a
=
–b),
then
w1,2adj
=
0, that is, there is no
linear correction term when upgrading to second-order, as first and second order
terms are orthogonal (i.e.,
). In the examples
considered in this work, input noise is indeed symmetric, so
W1adj
=
0 as in Figure
4. However, the formulation has
been kept general to include cases in which the input is not constrained (e.g.,
one can think of applications in which the input consists of a limited set of
images from medical reports).
Figure
2 provides an example of the
outcome of this procedure (with
NoTrialsPerBlock
=
200). A psychophysical
experiment was simulated as in the previous section, for a system of known
Volterra kernels (shown in a and d). b shows
W1
for this system, and e plots
W2
(minimizations were carried out using standard Matlab routines); the thin trace
in b shows
W1adj.
Both kernel estimates are quite good (as in the previous section, correlations
for the two kernels [shown in c and f] are high). It must be pointed out that
the validity of this procedure is independent of the particular structure that
was selected for the nonlinear system used in these simulations (i.e., the
system does not have to be literally implementing a Volterra expansion; this was
done here only for illustrative purposes).
Figure 2. This figure is equivalent to Figure 1, except here kernels are estimated using
the method described in Section 3 (functional minimization), rather than Section
2 (extended noise image classification). The thin line in panel b shows the
adjusting linear kernel
W1adj
– this is expected to be 0 for the input used in this simulation.
4. Brief quantitative comparison between these two approaches
Figure
3a plots correlations for the
two kernels, such as those shown in Figures
1c and 1f and
2c and
2f
(correlations between real and estimated kernel values), one (linear kernel,
abscissa) against the other (second-order nonlinear kernel, ordinate). The two
curves are for the two different methods described in the previous sections
(squares: ExtNIC, Section 2; circles:
fMin, Section 3). Moving along each curve following the arrow, the system goes from being very nonlinear (low [0.1] ratio between first- and second-order kernel amplitudes) to being very linear (high [10] ratio). Kernel estimates depend on the linearity/nonlinearity of the system: for a highly linear system (bottom right corner), estimates are best for linear kernels, but they get worse as the system becomes more nonlinear, and it is now estimates of the nonlinear kernels that improve (top left). This is, of course, what is expected. When the system has roughly equivalent linear and nonlinear components (top right), both kernel estimates are reasonably good. Overall, the two approaches perform very similarly (each open symbol refers to one estimate for one randomly generated system, each solid symbol to the average of 50 estimates for 50 different randomly generated systems).
Figure 3. a. Correlations (r's) as computed in Figures 1 and 2 are plotted for both kernels, L1 (linear, abscissa) against L2
(nonlinear, ordinate). Squares (ExtNIC) show correlations obtained using Equation 5 (dashed curve using Equation 4), circles using the fMin method (Section 3). Different points along the curve refer to different amplitude ratios for the two kernels: points at the top left refer to systems for which the amplitude of L2 was larger than the amplitude of L1
(very nonlinear system), whereas points at the bottom right refer to systems for
which the opposite was true. b. Correlations for both kernels (open for
first-order; solid for second-order) as a function of criterion bias in systems
ranging from very nonlinear (left panels) to very linear (right panels),
obtained using both methods (top row for ExtNIC, bottom for fMin). Dotted line
is for estimates obtained using Equation 4
rather than 5; solid line is for estimates
obtained using only false alarm trials. c. Same plotting conventions, as a
function of number of trials.
Figure
3b plots more correlation values
for both approaches (top row, ExtNIC; bottom row, fMin) and both kernels (solid
for first-order, open for second-order) at three different values of
linearity/nonlinearity ratio for the system (0.1 left, 1 middle, 10 right
panels), as a function of criterion bias
(x
axis). Criterion bias is in units of half the difference between the mean
response to signal+noise and that to noise alone; 0 is for a criterion that is
halfway between the mean responses. Values greater than 1 and smaller than
–1 refer to very conservative and very lax criteria, respectively. Again,
the two approaches overall perform very similarly. The main difference is that
for a very nonlinear system,
fMin
recovers the linear filter slightly better than ExtNIC at unbiased criterion,
but performs more poorly for biased criteria. The linear estimate from ExtNIC is
improved by using false alarms only
(L1[0,1],
solid line). The extreme case of a very nonlinear system shown in the left
panels is very unlikely; more likely scenarios are those in middle and right
plots. Both techniques work well for these cases, and are reasonably robust to
criterion bias. ExtNIC estimates obtained using second-order moments (Equation
4) were very similar to those
obtained using covariance (5), except in the estimate of the nonlinear kernel for a very linear system (dotted line). Using covariance is slightly more robust to criterion bias, as anticipated in Section 2.
Figure
3c (same plotting conventions as
3b)
shows how correlation values improve with the number of trials for a system with
linear/nonlinear ratio of 1, and 0 bias. Both techniques converge within 1,000
trials for the conditions explored here. It must be pointed out that no internal
noise was used in these simulations; convergence will be slower in the presence
of internal noise.
5. Application to real psychophysical data
First-order
and second-order kernels were estimated for a real dataset from a stereo-surface
detection experiment (this dataset is the same used for Figure
2 of Neri et al., 1999).
In Figure
4, left panels are kernels
derived using ExtNIC, right panels are kernels obtained using fMin (with
NoTrialsPerBlock
=
200). Clearly, there is no
way here to assess the goodness of derivation for these kernels as was done in
Figure
3, because we do not have direct
knowledge of the system's structure.
Figure 4. Linear (a and b) and second-order
nonlinear (c and d) kernels as computed using ExtNIC (Section 2) and fMin
(Section 3) for a dataset collected during a real psychophysical experiment
involving surface detection and disparity noise (Neri, Parker, & Blakemore,
1999). The thin line in plot b shows
W1adj,
the correcting linear kernel (see Section 3); 12,500 trials were used. Errors
and Z scores
were computed using the bootstrap for ExtNIC, and the Hessian matrix for
fMin.
First-order
kernels, as derived using the two different techniques, look different (compare
panels a and b). A possible reason for this difference is that in this
particular experiment, the target-absent stimulus was not just noise. For
reasons of careful psychophysical design, it was necessary to have
“signal” dots at zero disparity in the no-target condition (they
were at 6 arc min [target disparity] in the target-present condition). This
would require modification of some of the maths in the “Appendix,”
and could contribute to the difference between the linear estimates in Figure
4.
Second-order
kernels look different too, although they share some features (such as the
location of positive and negative peaks). The
fMin
estimate, however, does not reach statistical significance. The author has
verified that
fMin
is rather unstable for this dataset: when, for example, 6,000 rather than 12,500
trials are used, the minimization procedure returns a flat estimate for
W2.
It should
be noted that adding nonlinear filtering only slightly improves simulation of
observer's responses. For example, the linear kernel derived using the ExtNIC
method, when run through Equation
1, predicts psychophysical
responses on individual trials with an accuracy of 72% (this is when optimizing
for criterion value every 200 trials). Adding the nonlinear kernel in c only
increases this value by about 1%, and the situation is similar for
fMin.
The exact figures depend on how often Equation
1 is allowed to adjust for
criterion changes, but the improvement is small (this is the case even when
avoiding overfit by predicting responses on trials that were not used to derive
kernels).
The fact
that adding nonlinear kernels brings about only a minor improvement in
simulating observers' responses requires some discussion. First, this only
applies to the specific example considered here; nonlinear kernels may play a
greater role in other experimental contexts. The author's experience is,
however, that although the role may be greater, it is never particularly
sizeable. This leads to the following considerations (notice that these
considerations relate to a small but nonetheless
statistically
significant contribution of
second-order nonlinearity). The measure taken above (what percentage of
psychophysical responses is correctly predicted by Equation
1) applies only to the threshold condition explored in the experiment. That is, one cannot predict, from the estimate at threshold, what the importance of nonlinear behavior in the system would be in suprathreshold conditions. It is possible that a nonlinear mechanism that is being exposed at threshold may play a more important role at suprathreshold – the fact that it is being studied at threshold has to do
only with the technical requirements that are necessary to expose the mechanism,
but this does not mean that the mechanism is being studied in its
“ecological” regime for signal-to-noise ratio. In light of these
considerations, it becomes very important to devote efforts to the
characterization of nonlinear mechanisms, even if their impact on performance
may be small (but nonetheless significant) within the limited threshold range
explored in the experiment.
Even if
second-order nonlinearities are relatively small, it should be interesting to
compare them across conditions. For example, one could compare second-order
kernels before and after perceptual learning; it may be that no difference is
observed in the linear kernels, but that one is found for the nonlinear ones.
This would be informative regardless of the absolute impact of these
nonlinearities.
As hinted
in the previous paragraph, whichever view is taken on this, it is necessary to
assess the statistical reliability of the estimates for both linear and
nonlinear kernels. For the extended NIC method, a suitable approach is the
bootstrap (Efron & Tibshirani, 1993).
For the
fMin
method, the experimenter needs to adopt standard techniques for estimating the
spread of the minimum in search space. Because this is not an experimental work,
these topics are not dealt with any further. The reader may refer to Neri et al.
(1999)
and Neri and Heeger (2002)
for examples of applications of the bootstrap to linear and nonlinear kernel
estimation. In Figure
4, errors and
Z
scores were computed using this technique for panels on the left (ExtNIC); for
fMin
kernels, they were estimated using the Hessian matrix at the
minimum.
This work
presents two different approaches to the computation of psychophysical (both
linear and nonlinear) kernels. One approach consists of an extension of the NIC
method developed by Ahumada (Beard & Ahumada, 1998)
(termed ExtNIC), that takes into account second-order as well as first-order
statistics in the classified noise (Section 2); the other method
(fMin)
uses a minimization approach (Section 3). Under the conditions explored by the
simulations in Section 4, both methods work reasonably well.
A big
disadvantage associated with the fMin method is that it brings in all problems
associated with the solution of a minimization problem (e.g., local minima). If
input space is large, kernels are also large, and the minimization step becomes
prohibitively difficult (if not impossible) or very unreliable, as well as
time-consuming. On the other hand, the ExtNIC method provides (within reasonable
ranges of criterion bias and nonlinearity of the system) a fast and efficient
route to kernel estimation. However, when the input is not under direct control
of the experimenter and does not satisfy certain basic requirements (see
“Appendix”), ExtNIC cannot be used at all; in this situation,
fMin
or similar approaches become a necessary choice.
In general,
characterizing nonlinear kernels is an important step in understanding
psychophysical systems. This is true regardless of the impact that such
nonlinear processing may have within the context of the experiment that was
performed to characterize it. For example, it may be the case that nonlinear
behavior observed at noise threshold plays a much more important role in
suprathreshold conditions.
(1) for a
linear filter
(L2
= 0) followed by a static
nonlinearity
(LN
system),
L1est
(Equation
3) provides a good estimate of
the first-order kernel
L1
(Equation
16; this is in line with
Ahumada's [2002]
demonstration of the classical Bussgang result [Bussgang, 1952]);
(2) the
target-present average noise images
L1[1,o]
in Equation
3 (i.e., those for hits and
misses) are affected, for a system with nonzero second-order kernel
L2,
by “pollution” from this nonlinear term (Equation
15), providing an explanation for some observations made by Ahumada in relation to target-present averages (Ahumada, 1967);
(3) for an
LN
system and unbiased criterion,
L2est
= 0 (Equation
17); this result is central to
this “Appendix” – however,
(4) when
the criterion is biased,
L2est
(Equation
4) may contain modulations
related to
L1
by a
L1(ν )· L1(ξ )
term (Equation
18);
(5)
Equations
4 and 5 are,
under some conditions, a reasonable estimate of
L2
(Equation
19).
The
approach used here is similar to Lee-Schetzen's (Schetzen, 1980).
The system we will consider is nonlinear only in the second-order
(Li
= 0
for
i
>
2), and input space is
one-dimensional (this is the system considered in all the simulations in this
work). Its (unthresholded) output on trial
i
is: , | (8) |
where
s
is the stimulus configuration
(s
= 0 is noise only,
s
= 1 is target+noise),
Dim
is the size of input space (equal to filter size),
and:
where
ni(j)
is the noise image on trial
i,
and
t(j)
is the function defining the target. For Gaussian noise
n:
where
the symbol ∑∏ stands for summation over all distinct ways of
partitioning the
2M
random variables into products of averages of pairs (Schetzen, 1980). We
can now compute the average response of the system to stimulus types
s
= 0 and
s
= 1
(Ns
is the total number of trials of type
s;
typically,
N0
=
N1):  | (9) |
 | (10) |
where
r[t]
is the response of the system to
target-alone. Responses
on individual trials are (from
8):  | (11) |
 | (12) |
For
a linear system
(L2
= 0),
ri[1]
=
ri[0]
+
r[t],
which makes sense. However, adding a nonlinear kernel
L2
alters this simple
relationship,
Let us
derive
L1est,
the estimate for the linear kernel
L1.
Average noise images for the four stimulus-response classes are indicated
by
L1[s,o],
where
s
is, as above, the target (0, absent; 1, present) and
o
is the response (e.g., the average noise image for the false alarms is
L1[0,1]).
L1[s,1],
as computed using Ahumada’s method, can be written
as:
| (13) |
where
pi[s,1]
is the probability that trial
i
will be of type
[s,1],
and where g is a static nonlinear function that maps output response from the system onto probability of psychophysical response “yes.” We now expand
g
up to its second-order Taylor term around mean response
r–[s]
(implying that we are assuming
g
to have continuous derivatives up to order 3). Equation
13 can then be written
as:  | (14) |
Substituting
Equations
9 to 12 into
14,
this
becomes
 |
 | (15) |
and,
from Equation
13, it is easy to show that
. Equation
3 is
then
Let us verify something very familiar: for L2= 0, this reduces to
| (16) |
Figure
5 is a useful tool for thinking
about
g. This nonlinearity is assumed to be approximately odd-symmetric around its midpoint. As a matter of fact, most decisional transducers enjoy this symmetry (e.g., a noisy threshold belongs to this category; see Nykamp & Ringach, 2002,
for more examples). It is also the case that , and . When the criterion
is unbiased (i.e., ), the
two regions of
g
that map and onto and , respectively,
are mirror-symmetric, so that and . This means that,
for an unbiased
criterion,
 |
Figure 5. The function
g maps the system output
r (abscissa) onto probability of
responding “yes” (ordinate). In the text,
g(r) is approximated around
 and
 , the average responses when the target
is absent and present. g is assumed to
be an odd function with respect to the shifted origin
[ g-1(0.5),0.5]
(indicated by arrows) – this is a reasonable assumption, as most
decisional nonlinearities satisfy this requirement. For an unbiased criterion,
g is necessarily placed with respect to
the  's so that
 , and
 and
 map to symmetric regions of
g. This is the condition depicted here.
In this case, the symmetry of g means
that odd derivatives of g are the same
(e.g.,  ) and even derivatives are of opposite
sign (e.g.,  ) at
 and
 . This is not true when the criterion
is biased, and  and
 map to nonsymmetric regions of
g.
Despite
lack of bias in the criterion,
L1est
is still affected by terms that depend on the interaction between
L2
and the target. These terms come, of course, from target-present averages. This
result may be related to the observation made by Ahumada (1967)
that for a system pooling nonlinearly (e.g., max rule) from a bank of linear
filters, averages for noise-only trials would return the average of the bank
(i.e., the linear part of the process), whereas averages from target+noise
trials would return something heavily affected by the target shape. In fact,
Equation
15 shows that when the target is
present
(s
= 1), there are extra terms
that involve
L2·t;
for a nonlinear system
(L2
≠ 0), these terms
affect the target-present averages
L1[1,1]
and
L1[1,0],
making
L1est
(equation above) depart from Equation
16.
Let
us now turn to the nonlinear kernel
L2:
By adopting
a procedure similar to that used for deriving
L1[s,1],
one can show
that:  |
where
and
δ(x)
is the Dirac delta function
(δ(x)
= 0 for
x
≠
0, ). It is easy to
show that . We can derive
the estimate for
L2
according to Equation
4:
Let us
focus on an
LN
system
(L2
= 0). If the criterion is
unbiased, we
obtain  | (17) |
Notice that
this result applies even for expansions of
g
to orders higher than the second. This means that if an experiment is carried
out at unbiased criterion and
L2est
≠ 0, then the system
cannot be modeled as
LN
(which is the most widely used model in vision). In other words, the effect of
the nonlinear decisional step
g
on
L2est
can be neutralized by having a balanced criterion (provided
g
enjoys approximate odd-symmetry).
If the
criterion is biased, an
LN
system
returns  | (18) |
where the
K's
are constants (for a given overall criterion bias). The last term is the reason
for using Equation
5 rather than 4:
computing covariance partly compensates for the
L1(ν)L1(ξ )
term in the estimate above.
If
L2
≠ 0, then
L2est
is an appropriate estimator for
L2 if the criterion is unbiased and g can be sufficiently well approximated as linear over the range spanned by (i.e. ). For these
conditions, one
obtains:  | (19) |
Criterion
bias only slightly modifies this equation, adding
δ(ν-ξ)∙constant.
If the second-order term in the expansion of
g
is sizeable, then the term
B
appears in
L2est,
whether the criterion is biased or not. However, the simulations in section 4
show that this term is not particularly disruptive of
L2est.
This
research was supported by the Wellcome Trust (GR063322MA to P.N.).
Commercial
relationships: none.
Corresponding author: Peter Neri.
E-mail:
pn232@hermes.cam.ac.uk.
Address: Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3 EJ, England, UK.
1This
is a simplified description of the Volterra expansion. The original formulation
involves
convolution
between
Li
and
I
to obtain
Vi,
so that the dimensionality of
Vi
is equal to that of the input,
I:  |
In
Equation
2,
Vi
are scalars, as it is simply the
cross-correlation
between
Li
and
I
that is being computed. Whether this simplified representation is applicable or
not depends on the specific context being studied. It incorporates the
assumption that the nervous system is basing its behavioral decisions on outputs
from the most sensitive mechanism(s). A convolution operation can be thought of
as extended sampling by a bank of linear filters; of all these filters, only
those sampling the region in the vicinity of the target will be used to drive
behavior, as those provide useful information for the task at hand (i.e., they
are the most sensitive to the target). Equation
2 refers only to these
mechanisms. Another (more practical) reason for adopting this simplified
representation is that it conforms to Ahumada's formulation.
Abbey,
C. K., & Eckstein, M. P. (2002). Classification image analysis: Estimation
and statistical inference for 2AFC experiments.
Journal of Vision, 2(1), 66-78. [ PubMed] [ Article]
Ahumada, A. J. (1967).
Detection of tones masked by noise: A comparison of human observers with
digital-computer-simulated energy detectors of varying bandwidth. Doctoral
dissertation (Technical Report No.29, Human Communications Laboratory;
Department of Psychology), UCLA, Los Angeles, CA.
Ahumada, A. J., Marken, R.,
& Sandusky, A. (1975). Time and frequency analyses of auditory signal
detection. Journal of the Acoustical Society
of America, 57, 385-390.
Ahumada, A. J. (1996).
Perceptual classification images from Vernier acuity masked by noise [Abstract].
Perception,
25, 18.
Ahumada, A. J. (2002).
Classification image weights and internal noise level estimation.
Journal of Vision,
2(1), 121-131. [ PubMed] [ Article]
Beard, B. L., & Ahumada,
A. J. (1998). A technique to extract relevant image features for visual tasks.
Proceedings of SPIE,
3299, 79-85.
Bussgang, J. J. (1952).
Crosscorrelation functions of amplitude distorted Gaussian signals (Tech. Rep.
No. 216). Boston: MIT Research Laboratory of Electronics.
Efron, B., & Tibshirani,
R. (1996). An introduction to t |