| Volume 2, Number 1, Article 1, Pages 1-11 |
doi:10.1167/2.1.1 |
http://journalofvision.org/2/1/1/ |
ISSN 1534-7362 |
Full identification of a linear-nonlinear system via cross-correlation analysis
Duane Q. Nykamp |
Department of Mathematics, UCLA, Los Angeles, CA, USA |
|
Dario L. Ringach |
Departments of Psychology and Neurobiology and Brain Research Institute, UCLA, Los Angeles, CA, USA |
|
Abstract
A statistical model used extensively in vision research consists of a cascade of a linear operator followed by a static (memoryless) nonlinearity. Common applications include the measurement of simple-cell receptive fields in primary visual cortex and the modeling of human performance in various psychophysical tasks. It is well known that the front-end linear filter of the model can readily be recovered, up to a multiplicative constant, using reverse-correlation techniques. However, a full identification of the model also requires an estimation of the output nonlinearity. Here, we show that for a large class of static nonlinearities, one can obtain analytical expressions for the estimates. The technique works with both Gaussian and binary noise stimuli. The applicability of the method in physiology and psychophysics is demonstrated. Finally, the proposed technique is shown to converge much faster than the currently used linear-reconstruction method.
History
Received June 21, 2001; published January 2, 2002
Citation
Nykamp, D. Q. & Ringach, D. L. (2002). Full identification of a linear-nonlinear system via cross-correlation analysis.
Journal of Vision, 2(1):1, 1-11,
http://journalofvision.org/2/1/1/,
doi:10.1167/2.1.1.
Keywords
correlation, moment method, reverse correlation, simple cell, linear kernel, classification images
for related articles by these authors
for papers that cite this paper |
A linear-nonlinear model consists of a cascade of a
linear operator acting on the input in series with a static nonlinearity. In
practice, the input (or stimulus) is a finite-dimensional vector,
x, and the output (or response),
r, can be either 0 or 1. In this case,
the model can be written simply
as  | (1) |
where
 is a nonlinear function and the
(column) vector w represents the linear filtering operation. This equation can
be considered a one-term projection pursuit regression model
( Friedman & Stuetzle, 1981).
Therefore, the optimization techniques developed for projection pursuit
regression can be used to fit w and  to the data. In
the laboratory, however, the researcher has full control of the stimulus;
therefore, as shown below, more efficient methods for estimating the parameters
of the model can be developed. Cross-correlation
methods are commonly used to estimate the front-end linear filtering (the vector
w) in this model
( Marmarelis & Marmarelis, 1978;
Jones & Palmer, 1987;
Emerson, Korenberg, & Citron, 1989;
DeAngelis, Ohzawa, & Freeman, 1993a;
DeAngelis, Ohzawa, & Freeman, 1993b;
Reid, Victor, & Shapley, 1997;
Ringach, Sapiro, & Shapley, 1997;
Anzai, Ohzawa, & Freeman, 1999). A
simple geometrical proof of the technique can be found in
Chichilnisky (2001). In addition to
estimating w, a full characterization of
the model also requires that we estimate the shape of the
nonlinearity  .
The importance of estimating
 has been somewhat neglected in the past
[however, see Anzai et al, (1999) and
Chichilnisky (2001)]. This is
surprising because it is well known that the tuning properties of a neuron (such
as its orientation or spatial frequency bandwidth) do not depend solely on its
linear kernel; they can also be influenced by a static nonlinearity in the spike
generation mechanism, represented by  . Specifically, it
has been suggested that both direction and orientation tuning can be sharpened
significantly by thresholding or by an accelerating nonlinearity
( Reid, Soodak, & Shapley, 1987;
DeAngelis et al, 1993a,b;
Anzai et al, 1999;
Carandini & Ferster, 2000). Thus,
estimating  is clearly important when comparing the
tuning properties of the model to that of the data.
Similarly, cross-correlation methods are being used in
visual psychophysics applications to recover the “classification
images” used by subjects in two-alternative forced-choice (2AFC) tasks
( Ahumada & Beard, 1998;
Ringach, 1998;
Gold, Murray, Bennett, & Sekuler, 2000).
These methods show that performance is correlated, to some extent, with some
physical attribute of the image. However, in order to quantitatively evaluate
the fraction of human performance explained by the model, one needs to identify
the static nonlinearity as well.
One strategy to identify
the nonlinearity is to perform a two-step analysis
( Anzai et al, 1999;
Chichilnisky, 2001). First, measure
the linear kernel using reverse correlation, then generate a scatter-plot of the
linear prediction
(  ) against the
actual response r and smooth it to
obtain an estimate of the static nonlinearity. Here we show that for a large
class of nonlinearities it is possible to write down a closed form solution for
the estimates of the linear kernel and the static nonlinearity in one single
step. The parameters of the nonlinearity are found by matching the input-output
moments of the model to the data. We refer to this technique as the moment
method. The calculation avoids computing the linear prediction altogether
(which is a very time-consuming calculation) and can be updated recursively, as
new data arrives. Furthermore, our results apply to both Gaussian and binary
noise inputs.
Let each component of the input be an independent
normal (Gaussian) random variable with standard deviation
 . Alternatively, let each component be a
binary random variable that takes the values
 or
 with equal probability. As shown in
Appendix A, if the linear kernel is
normalized so that
 , the average
output
is  | (2) |
The correlation between the
input and the output is proportional to the kernel
w, | ( 3) |
where
 denotes expected value and the constant
of proportionality  is given by
( Bussgang, 1952)  | (4) |
Thus, the linear filter
w can be recovered from
 because
 , which is the usual reverse correlation
result.
Through
Equations 2 and
4, the measurement of the mean output,
 , and the proportionality
constant,  , impose constraints on the nonlinearity
 . The proposed moment method uses these
measurements, which correspond to the first moment of the output
 and the second input-output moment
(cross-correlation), to specify the nonlinearity. For example, if one considers
a class of nonlinearities specified by only two parameters,
Equations 2 and
4 can be used to determine
 . If, however, one considers a class of
nonlinearities with more than two parameters, additional conditions must be
provided to determine  . We demonstrate
the procedure with some classes on nonlinearities used in both electrophysiology
and psychophysics: half-rectification with threshold, a power law, an error
function, and the Naka-Rushton
equation. Half-Rectifier with Threshold Nonlinearity
A linear-nonlinear system with a threshold
half-rectifier nonlinearity has been one of the classical models of simple cells
( Movshon, Thompson, & Tolhurst, 1978;
Tolhurst & Heeger, 1997;
Carandini & Ferster, 2000). The
nonlinearity is described by,  , where
 is a half-rectifier such that
 if
 and is zero otherwise. In this case
(see Appendix A), it can be shown
that  | ( 5) |
 | (
6) |
where  is the
complementary error function and  is the error
function. The system represented by Equations 5 and 6 has two measured
variables,  and
 , and two unknowns,
 and
 . Thus, given the measurements, one can
solve numerically the system to obtain estimates of
 and
 .
This is an important example because power laws are
used to model the output nonlinearity in simple cortical cells
( Emerson et al, 1989;
Albrecht & Geisler, 1991;
Heeger, 1992;
Anzai et al, 1999). Furthermore, they have
been shown to fit the data slightly better than half-rectification
( Tolhurst & Heeger, 1997). If we
assume  then, as shown in
Appendix A, we
obtain  | (7) |
 | (8) |
where the gamma
function is given by  . Once again, the
system represented by Equations 7 and
8 has two measured variables,
 and
 , and two unknowns,
 and
 . Thus, given the measurements one can
solve numerically the system to obtain estimates of
 and
 .
Error Function Nonlinearity
Another important case is when the nonlinearity can be
represented by an error
function:  | (9) |
This class of nonlinearities
is specified by three parameters:  ,
 , and
 . In
Appendix A we derive the following
specific forms of Equations 2 and
4 for the error function
nonlinearity:  | (10) |
 | (11) |
where
 . Because these equations provide only
two constraints and the nonlinearity has three parameters, additional
information is required to identify the system. One approach is to obtain
additional constraints by estimating higher moments between the input and output
of the system, which is discussed below. Another possibility is to begin by
estimating  . For example,
 can be taken as the reciprocal of the
minimum inter-spike interval in the data record, which is the method used in
this paper. Alternatively, the maximum output rate may be determined by the
response to optimal stimuli from other experiments. In the case of 2AFC
psychophysical experiments,  can be defined as
1. Thus, if we know  and measure
 and
 , we can solve
for
 and

analytically. Naka-Rushton Nonlinearity
A final example of a sigmoidal function commonly used
in models is the Naka-Rushton (NR)
equation,  | (12) |
with parameters
 ,  , and
 . Unfortunately, the integrals in
Equations 2 and
4 do not have nice analytic expressions,
but we can still solve numerically the system of integral
equations,  | (13) |
 | (14) |
for
 and
provided that  is estimated from
some additional information as
above.
To demonstrate the ability of the moment method to
reconstruct the nonlinearity, we tested the model with two simulated
experiments. First, we modeled the response of a human subject in a 2AFC
psychophysical task. Second, we modeled the receptive field of a simple-cell in
primary visual cortex. In each case, we calculated the mean response rate and
correlation magnitude, using the method described in detail in
Appendix B. To evaluate the performance
of the method, the parameters of the simulated nonlinear system were compared
with these estimated values. We also compared the performance of the proposed
technique with variants of the linear reconstruction method.
Anzai et al (1999) used a straightforward
implementation of the linear reconstruction (or LR) method, whereas
Chichilnisky (2001) implemented a
cross-validated version of the technique (or CLR). Details about the
implementations of these algorithms are described in
Appendix C. To anticipate the results, we
find that the moment method clearly outperforms the LR and CLR
techniques.
We simulated the study of
Ahumada & Beard (1998) who measured
classification images in a Vernier acuity task. In our simulations, we assume a
spatial kernel (the vector w) similar to
those obtained in the actual measurements
( Figure 1a) and an error function
nonlinearity with  = 1,
y0=0.5,
and e=1. In each trial of the experiment one of two possible stimuli,

or  , is presented in the presence of
additive white Gaussian noise with  . The subject is
asked to identify the stimulus. We code the subject’s response as
 for the
 choice and
 for the
 choice. The probability that the
subject will select  is modeled as
 . If we pool all the trials in which
only  was presented, we obtain
 (where
 represents the noise in the stimulus).
Similarly, for the trials in which only  was presented, we
obtain  . The “classification
images” obtained by cross-correlation between response and stimulus in
these two cases should be the same and proportional to
 . Similarly, the nonlinearities should
be identical up to a translation. In fact, these constraints could be used as a
test for the validity of the model in a 2AFC
task. Figure
1 . The kernel used for the psychophysical task. The
kernel used in the simulation (a). Red regions indicate areas where increases
in luminance would result in an increase in the probability of reporting the
 stimulus. Blue regions represent areas
where increases in luminance would result in a decrease of the probability of
reporting the  stimulus. The estimate of the kernel
after 10,000 (b) and 100,000 (c) trials.
We ran simulations that contained between 300 and
100,000 trials per stimulus. We selected one of the stimuli and calculated the
linear kernel (or “classification image”)
( Figure 1b and 1c) and the estimated
error function nonlinearity
( Figure 2).
The nonlinearity calculations show that the proposed method converges faster
than the LR/CLR methods (the result from the CLR method is not shown, but it was
no better than the LR method). Our technique gives a reasonable approximation
to the nonlinearity after only 500 trials, whereas the LR method yields a gross
underestimate ( Figure 2a). After 2400
trials, the moment method has determined the nonlinearity almost perfectly, but
the LR method still underestimates
( Figure 2b). After 75,000 trials, both
methods have converged to the correct solution
( Figure 2c).
To study the convergence of the algorithm, we simulated
60 runs and estimated the average relative error of the parameters for
increasing lengths of the data record, ranging from 300 to 100,000 trials. A
summary of the average relative error in  and
 vs. length of the data record is shown
in Figure 2d. After 2500 trials, the
error is down to 10 percent, and it converges toward zero as the number of
trials is increased further. Figure
2 : Results from the psychophysical simulations. a.
Estimate of the nonlinearity after 500 trials. The solid green line is the
estimated error function nonlinearity using the moment method, and the dashed
blue line is the error function nonlinearity actually used in the simulations.
Red diamonds show the nonlinearity calculated from the LR method. Error bars are
one standard error and reflect error only from the second half of the
reconstruction. Parameters obtained with error function estimate:
 = 1,
y0
= 0.29, and ε = 0.67. b. Estimates of the nonlinearity after 2400 trials.
Estimated parameters:  = 1,
γ 0 = 0.43, and ε =
0.97. c. Estimates of the nonlinearity
after 75,000 trials. The estimated nonlinearity nearly coincides with the
simulated nonlinearity and is barely visible. Estimated parameters:
 = 1,
γ 0 = 0.502, and ε = 1.02.
d. Rate of convergence of the moment method to the simulated parameters. The
relative errors are averages over 60 simulations. Relative errors are less than
10% after 2500 trials.
The second simulation was that of a simple cell in
visual cortex. To mimic a cell’s response, we used a linear kernel
obtained from cross-correlation analysis of actual experimental data. The kernel
was discretized in 45 time slices with  ms and by a 40 x
40 grid in space (see Figure 3a). In
each 2-ms interval, the probability of a spike was given by Equation 1, and the
nonlinearity was a power law with parameters
A = 0.01 and β = 2. The input was
binary noise taking the values 1 and –1, giving a standard deviation of
σ = 1. Figure
3 : Estimates of a time slice of the linear kernel. Time
slice of the simulated kernel at 60 ms (a). Estimates of the time slice at 60
ms calculated after 30 simulated minutes (b) and 8 simulated hours (c) of data
collection.
We simulated the response of the simple cell to noise
stimuli with experimental time ranging from 1 minute to 8 hours. For each case,
we estimated the kernel ( Figure 3b and
3c) and the power law nonlinearity
( Figure 4). These simulations show the
strength of the moment method with large kernels, such as the 40×40×45
kernel simulated here. Unlike the LR/CLR methods, our technique closely
approximates the simulated nonlinearity after only 15 minutes, or less than 5000
spikes ( Figure 4a). Even the CLR
method required 8 simulated hours (more than 140,000 spikes) to converge. The
bias in the LR method is large and is still substantial in the 8-hour
simulation.
The relatively poor performance of the CLR method can
be explained by the noise in the estimate of a large kernel. If the estimate of
w were extremely noisy, the linear
prediction  would be mostly noise, and the CLR
method would give a flat line at the mean rate.
Figure 5b illustrates that the CLR
results lie between the mean rate and the true nonlinearity after 15 minutes of
data collection, as would be expected from a noisy estimate of
y. As the simulation length increases,
the results approach the correct nonlinearity. A more detailed discussion of
the noise in the CLR methods, as well as a discussion on the bias in the LR
method, can be found in Appendix C.
Figure 4d
shows the rate of convergence of the estimated nonlinearity as a function of
simulated time between 1 minute and 8 hours. For each data point of 1 hour or
less, we computed 12 simulations and calculated the average relative error in
the nonlinearity parameters. For longer trials, the relative error estimates
were based on four simulations of 2 hours, two simulations of 4 hours, and one
simulation of 8 hours. After only 10 minutes of simulation (around 3000
spikes), the estimated parameters of the power law are within 10 percent of the
actual parameters.
To obtain the above results, we assumed we knew the
nonlinearity was a power law. Normally, the investigator does not necessarily
know in advance the shape of the nonlinearity. A natural question is what would
happen if we were to fit other nonlinearities to data generated by a power law
function. Figure 5a shows results of
the moment method if we assumed a power law, error function, or Naka-Rushton
nonlinearity. The different estimates roughly agree for small values of
y but diverge as
y increases. Thus, confidence in any
extrapolation to large y values
requires precise knowledge of the shape of the nonlinearity. Because the LR and
CLR methods do not make any assumptions about the shape of the nonlinearity,
they could be used to determine the proper class of nonlinearities to use in the
moment method.
The moment method is a parametric technique that
assumes a particular class of nonlinearities. On the other hand, the LR and CLR
techniques are nonparametric algorithms. A natural question is: would the
performance of the two techniques be similar if we added information about the
class of relevant nonlinearities to the LR/CLR methods. This could be done by
fitting a parametric nonlinear function to the scatter-plot of the linear
prediction versus the actual response in the LR/CLR algorithms. In the
simulation of the psychophysics experiment
( Figure 2), we fit an error function,
and for the simulation of the simple-cell receptive field
( Figure 4), we fit a power-law.
Figure 6 shows the maximum relative
error of the parameters obtained from these fits as well as the parameters from
the moment method. In all cases, the moment method is dramatically superior to
the LR/CLR methods. The better performance of the LR method than the CLR method
in Figure 6a is understandable because
removing bias can sometimes decrease accuracy. The erratic behavior of the LR
results in Figure 6b is due its
deviation from a power-law shape (as seen in
Figure 4a and
b). Figure
4 . Nonlinearity calculations from the simple-cell
simulation. a. Estimated nonlinearity after 15 minutes of simulated time. The
solid green line is the power-law estimate of the nonlinearity and the dashed
blue line is the nonlinearity used in the simulation. The error bars of the LR
(red) and CLR (black) represent one standard error and were calculated from the
second step of the reconstruction. Parameters from the estimated nonlinearity:
A = 0.00959 and β = 2.08. b.
Estimated nonlinearity after 1 hour of simulated time. Estimated parameters:
A = 0.00985 and β =
2.02. c. Estimated nonlinearity after 8
hours of simulated time. Estimated parameters:
A = 0.00995 and β = 2.002.
d. Rate of convergence of the moment
method to the simulated parameters. Relative errors are less than 10% after 10
minutes of simulated time. In panels a-c, the y axis is in units of spikes per
ms.
Figure
5 : a. Estimated error function (red line), power law
(green line), and Naka-Rushton (cyan line) nonlinearities to one simulated hour
of data generated by a power law function (dashed blue line). The y axis is in
units of spikes per ms. b. Consequences of a noisy estimate of
w on the shape of the nonlinearity
obtained by the CLR method. The dotted line represents the mean output rate.
As expected, the estimate lies between this line and the simulated nonlinearity.
Data are from 15 minutes of simulated time.
Figure
6 . Rates of convergence of the methods to estimate the
nonlinearity parameters. Relative errors in the moment method (green line), LR
method (red line), and CLR method (black line) are plotted versus simulation
length. a. Relative error of fitted error function parameters in the
psychophysical task simulation. The maximum of the relative error
in
y0 and the relative error in
ε is shown. Relative errors are less than 10% after 2500 trials for the
moment method and after 40,000 to 50,000 trials for the LR and CLR methods.
Results are from the same simulations used for
Figure 2d. b. Relative error of fitted
power law parameters in the simple cell simulation. The maximum of the relative
error in A and the relative error in
β is shown. Relative errors are less than 10% after 10 minutes of
simulated time for the moment method and after 8 hours for the CLR method. The
LR results were not fit well by a power law after less than 2 hours of simulated
time; thus, the relative error measurements for those simulations have little
meaning. Even after 8 hours, the relative error for the LR method was around
30%. Results are from the same simulations used for
Figure 4d.
The moment method is a fast and accurate method to
identify a linear-nonlinear system. In this method, we estimate the linear
filter using cross-correlation, and identify the output nonlinearity by matching
the mean response and cross-correlation amplitude to the data. The method is
fast because it does not require the calculation of a linear prediction to
estimate the static nonlinearity. Simulations demonstrate that the rate of
convergence of the moment technique is faster than the LR and CLR methods. The
advantage of the moment method is particularly evident when the kernels are
large, as in the simulation of the simple cell
( Figure 4). Here, the moment method
obtains a reasonable estimate of the nonlinearity after 15 minutes, whereas the
LR/CLR estimates are inaccurate for this amount of data. Another benefit of the
technique is that estimates can be updated recursively as new data arrives.
This provides a means to stop an experiment when a sufficient signal-to-noise
ratio is achieved. We note that both the moment method and the LR/CLR methods
are dependent on the assumption of a linear-nonlinear system as described by
Equation 1. Thus, these methods will give accurate results only to the extent
that a system is well represented by this lumped phenomenological model.
As mentioned above, one advantage of the LR/CLR methods
is that no assumptions are made about the shape of the nonlinearity. This
observation suggests using the LR/CLR methods initially when studying a new
system. These experiments will be time consuming, as large amounts of data are
required for accurate estimates of . However, once a
family of parametric functions is found to fit the measured nonlinearities, the
more efficient moment method can be used to conduct further experiments.
Several nonlinearities of interest are parameterized by
more than two variables. In these cases, additional conditions must be provided
to fully determine  . One possibility is to obtain
independent information about the remaining parameters (as done with
 above). In the present study, we
calculate two moments to estimate a two-parameter nonlinearity. We are
currently investigating the possibility of calculating higher order moments to
constrain the additional parameters.
Appendix A. Derivation of Equations for Nonlinearity
In this appendix, we derive
the equations for the mean response  and correlation
 for the linear-nonlinear system in
response to a discrete approximation of Gaussian white noise. In this
derivation, we ignore the distinction between space and time and assume that the
probability of a response is given
by  | (A1) |
Equation 1
where each component of x is a normal
random variable with mean zero and standard deviation σ. Let
N be the dimension of
w and
x, thus
x has the probability density
function  | (A2) |
We derive
Equation 2 for the mean
response  :  | (A3) |
In the derivation, we let
u =
Ox, where
O is any orthogonal matrix whose first
row is
wT.
(Recall that w is a unit vector.) Then
wTx
is the first component of u, which we
replaced by y.
To derive an expression for the input-output
correlation  | (A4) |
 | (A5) |
Bj
= 0 for  because
Bj
is an odd integral in this case. The first component of
B
is  | (A6) |
where we integrated by parts
to obtain the last expression.
Thus,  | (A7) |
where
e1
is the first unit vector. The input-output correlation is
then  | (A8) |
 | (A9) |
This result is known as
Bussgang’s Theorem
( Bussgang, 1952). Half-rectifier with threshold nonlinearity
If  then  | (A10) |
which is Equation 6.
Also,  | (A11) |
If  ,
then  | (A12) |
which is
Equation 7.
Also,  | (A13) |
Error function nonlinearity
Let the nonlinearity
be  | (A14) |
where
erf( x) is the error function
 . The derivative of

is
Then, the mean response
is  | (A15) |
where

and  . We integrated by parts in the second
step. To obtain the last step, we computed a change of variables with one of the
variables parallel to the line  and completed the
square in the resulting integrand. We have thus obtained Equation10.
The magnitude of the input-output correlation is
obtained by a completion of the
square:  | (A16) |
One challenge in using the reverse-correlation
technique is getting enough data for a reasonable signal-to-noise ratio.
Typically, one must run long experiments in order to obtain good results. This
problem is exacerbated with physiological experiments because cells do not
respond well to a white noise stimulus. Thus, one would like to develop ways to
increase the output rate.
If  , the output rate
increases with the standard deviation s of the input [to see this, integrate
Equation 2 by parts]. Therefore, one would
like to maximize the standard deviation of the input. However, physical devices,
like computer monitors, have limited dynamic ranges. The discrete approximation
to Gaussian white noise as described above will have a relatively small s given
restraints on the magnitude of the stimulus. We thus extended the theory to more
general forms of noise stimuli that increase the stimulus standard
deviation.
Let M denote
the maximum stimulus magnitude that could be produced by an input device, ie,
each component of the stimulus must be in the interval
[–M,
M]. In that case, a preferable input
would be one where each component of the stimulus was a binary random variable
that took on the values M and
–M with equal likelihood. This
input would have a standard deviation of
M.
Generalizations of the central limit theorem permit the
use of our reconstruction method even with this input. Because
 , the sum
 converges in distribution to a normal
random variable with standard deviation
M as the number of components increases
(for example, see the discussion on page 700 of
Port 1994). This convergence is valid as long
as the sum isn't dominated by one component of
w. Because the argument of the
nonlinearity
g( x)
would be approximately a normal random variable with s =
M, the output rate should closely match
the white noise case.
Furthermore, to compute
 one really computes an average of the
input conditioned on the presence of an output event
[ Equation B1]. The central limit theorem
states that this average converges in distribution to a normal random variable.
The results for non-Gaussian input, such as binary input, should thus closely
approximate the above results. Numerical simulations showed virtually the same
results for both types of
input. Appendix B. Computation of Correlation Magnitude
To compute the magnitude of
the input-output correlation, one first calculates
 | (B1) |
This average stimulus is
simply the input-output correlation divided by the mean response
rate:  | (B2) |
We used Bayes’ Theorem
to complete the third step and Equations 1
and A2 in the fourth step.
 is the probability density function of
x conditioned on
r = 1. The mean output rate is simply
 . The correlation magnitude is
thus  | (B3) |
There is a subtlety regarding
how one should calculate  to reduce the
bias in the estimate of C. If
 is a noisy estimate of
p, then
 overestimates the magnitude of
 . Thus, a naïve calculation of
C will overestimate the input-output
correlation.
Assume  is an unbiased
estimator of p so that
 . Let
 be the jth component of
p. If
 is the standard deviation of
 ,
then  | (B4) |
 | (B5) |
Thus, an unbiased estimate of

is  | (B6 ) |
To calculate
 , we divide the results into multiple
trials and calculate an approximation to
p for each trial. We let
 be the average of these values and
 be the sample standard deviation of the
mean. We then use B6 and
B3 to calculate the input-output
correlation.
The bias with the naïve estimate of the
input-output correlation increases with the size of the kernel. The magnitude
 is fixed for a given nonlinearity, but
 increases with the number of
components. Thus, the bias in the naïve estimate of
C was much larger with the 40x40x45
kernel of the simple cell than with the 32x32 kernel of the psychophysical
task.
Calculations with only small amounts of data can be so
noisy that  . In that case,
 is the square root of a negative
number, and we cannot compute an estimate of the nonlinearity. Thus, in
Figure 2d we could not include the
results for a few simulations with 500 or fewer trials. Similarly, we could not
include one simulation that was a minute long in
Figure 4d. Appendix C. Linear Prediction Method
For a benchmark by which to evaluate the moment method,
we also calculated the static nonlinearity with a linear reconstruction (LR)
method similar to that employed by
Anzai et al (1999) and
Chichilnisky (2001). This two-step
method involves first calculating the linear kernel
w via reverse correlation and then
calculating the average response as a function of the linear prediction
 . In the latter calculation, we
discretized y into intervals, using
narrower intervals around zero and wider intervals away from zero, because
y will have a normal distribution
centered around zero. For each interval, we calculated the mean and standard
error of the response if the interval contained three or more data points.
We present results from both the LR method and a
cross-validated implementation of the linear-prediction method (CLR). In the LR
method, we used all the data for both parts of the calculation. In the CLR
method, we divided the data into N
parts, where  . We calculated
N–1 versions of the linear
kernel, where each version used all the data
except two consecutive parts. Thus,
each version of the kernel would be based on approximately 99% of the data. We
then used all the data to calculate the average response as a function of the
linear prediction. The key of the CLR method was to switch kernels used for the
linear prediction  so that the same data was never used
simultaneously for both the linear kernel estimate
w and the corresponding stimulus
x. Thus, the CLR method removes the
bias in  created from covariance between the
stimulus x and the noisy estimate of
w.
The bias in  is similar to the
bias in  discussed above. As with
 , the bias
in  increases with the size of the kernel.
Thus, for small kernels, the CLR method may not give better results than the LR
method because sometimes removing bias can increase the error. The bias with
large kernels, on the other hand, may greatly skew the results of the LR method.
In this case, the CLR method will be significantly more accurate. For the small
32x32 kernel of the psychophysical simulations, the bias in the LR method did
give an underestimate of the nonlinearity
( Figure 2a and 2b), but the bias was
small. As shown in Figure 6a, the LR
method outperformed the CLR method in this case. However, the large 40x40x45
kernel of the simple-cell simulations produced significant bias in the LR method
even after 8 hours ( Figure 4). In this
case, the CLR method was necessary to give accurate results.
Although the CLR method removes the bias caused by
covariance between the stimulus and the kernel estimate, it is still sensitive
to noise in the estimate of w, as
discussed in connection with Figure 5b.
When w is high dimensional, small noise
in each component of w can build up to
make the unbiased estimate of  very noisy, as
observed in the simple-cell
simulations.
We thank E. J. Chichilnisky and J. D. Victor for
helpful comments and criticisms on an earlier version of this paper. This work
was supported by a NSF Mathematical Sciences Postdoctoral Research Fellowship
(D.Q.N) and by NSF-IBN-9720305 and NIH-EY-12816
(D.L.R).
Ahumada A. J. Jr., & Beard
B.L. (1998). Response classification images in vernier acuity.
Investigative Ophthalmology and Visual
Science, 39, S1109.
Albrecht, D. G., &
Geisler, W. S. (1991). Motion selectivity and the contrast-response function of
simple cells in the visual cortex. Visual
Neuroscience, 7,
531-546.
[PubMed]
Anzai A., Ohzawa I., &
Freeman R. D. (1999). Neural mechanisms for processing binocular information.
I. Simple cells. Journal of Neurophysiology,
82,
891-908.
[PubMed]
Bussgang J.J. (1952).
Crosscorrelation functions of amplitude distorted Gaussian signals. Technical
Report 216. Boston: Massachusetts Institute of Technology Research Laboratory of
Electronics.
Carandini, M., &
Ferster, D. (2000). Membrane potential and firing rate in cat primary visual
cortex . Journal of Neuroscience,
20,
470-484.
[PubMed]
Chichilnisky E. J.
(2001). A simple white noise analysis of neural light responses.
Network,
12,
199-213.
[PubMed]
DeAngelis, G. C., Ohzawa,
I., & Freeman, R. D. (1993). Spatiotemporal organization of simple-cell
receptive fields in the cat's striate cortex. I. General characteristics and
postnatal development. Journal of
Neurophysiology, 69,
1091-1117.
[PubMed]
DeAngelis, G. C., Ohzawa,
I., & Freeman, R. D. (1993). Spatiotemporal organization of simple-cell
receptive fields in the cat's striate cortex. II. Linearity of temporal and
spatial summation. Journal of
Neurophysiology, 69,
1118-1135.
[PubMed]
Emerson R. C., Korenberg M.
J., & Citron M. C. (1989). Identification of intensive nonlinearities in
cascade models of visual cortex and its relation to cell classification. In
V.Z. Marmarelis (Ed.), Advanced Methods of
Physiological System Modeling
(pp. 97-111). New York: Plenum 1989.
Friedman J. H., &
Stuetzle (1981). Projection pursuit regression.
Journal of the American Statistical
Association 76, 817-823.
Gold J. M., Murray R.F., Bennett
P. J., & Sekuler A. B. (2000). Deriving behavioural receptive fields for
visually completed contours . Current Biology,
10,
663-666.
[PubMed]
Heeger D. J. (1992).
Normalization of cell responses in cat striate
cortex . Visual Neuroscience,
9,
181-197.
[PubMed]
Jones J. P., & Palmer L. A.
(1987). An evaluation of the two-dimensional Gabor filter model of simple
receptive fields in cat striate cortex.
Journal of Neurophysiology,
58,
1233-1258.
[PubMed]
Marmarelis P. N., &
Marmarelis V. Z. (1978). Analysis of physiological systems: The white noise
approach. New York: Plenum Press.
Movshon, J. A., Thompson, I.
D., & Tolhurst, D. J. (1978). Spatial summation in the receptive fields of
simple cells in the cat's striate cortex.
Journal of Physiology (London),
283,
53-77.
[PubMed]
Port, S. C. (1994).
Theoretical Probability for
Applications. New York: John Wiley & Sons.
Reid, R. C., Soodak, R. E., &
Shapley, R. M. (1987). Linear mechanisms of directional selectivity in simple
cells of cat striate cortex. Proceedings of
the National Academy of Sciences of the United States of America,
84,
8740-8744. [PubMed]
Reid R. C., Victor J. D., &
Shapley R. M. (1997). The use of m-sequences in the analysis of visual neurons:
Linear receptive field properties. Visual
Neuroscience 14,
1015-1027.
[PubMed]
Ringach D. L., Sapiro G, &
Shapley R. (1997). A subspace reverse-correlation technique for the study of
visual neurons. Vision Research
37,
2455-2464.
[PubMed]
Ringach D. L. (1998). Tuning
of orientation detectors in human vision.
Vision Research 38,
963-972.
[PubMed]
Tolhurst, D. J., &
Heeger, D. J. (1997). Comparison of contrast-normalization and threshold models
of the responses of simple cells in cat striate cortex.
Visual Neuroscience,
14,
293-309.
[PubMed]
|