| Volume 6, Number 4, Article 11, Pages 441-474 |
doi:10.1167/6.4.11 |
http://journalofvision.org/6/4/11/ |
ISSN 1534-7362 |
Estimating nonlinear receptive fields from natural images
JoaquĆn Rapela |
Department of Electrical Engineering and Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA |
|
Jerry M. Mendel |
Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA |
|
Norberto M. Grzywacz |
Department of Biomedical Engineering and Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA |
|
Abstract
The response of visual cells is a nonlinear function of their stimuli. In addition, an increasing amount of evidence shows that visual cells are optimized to process natural images. Hence, finding good nonlinear models to characterize visual cells using natural stimuli is important. The Volterra model is an appealing nonlinear model for visual cells. However, their large number of parameters and the limited size of physiological recordings have hindered its application. Recently, a substantiated hypothesis stating that the responses of each visual cell could depend on an especially low-dimensional subspace of the image space has been proposed. We use this low-dimensional subspace in the Volterra relevant-space technique to allow the estimation of high-order Volterra models. Most laboratories characterize the response of visual cells as a nonlinear function on the low-dimensional subspace. They estimate this nonlinear function using histograms and by fitting parametric functions to them. Here, we compare the Volterra model with these histogram-based techniques. We use simulated data from cortical simple cells as well as simulated and physiological data from cortical complex cells. Volterra models yield equal or superior predictive power in all conditions studied. Several methods have been proposed to estimate the low-dimensional subspace. In this article, we test projection pursuit regression (PPR), a nonlinear regression algorithm. We compare PPR with two popular models used in vision: spike-triggered average (STA) and spike-triggered covariance (STC). We observe that PPR has advantages over these alternative algorithms. Hence, we conclude that PPR is a viable algorithm to recover the relevant subspace from natural images and that the Volterra model, estimated through the Volterra relevant-space technique, is a compelling alternative to histogram-based techniques.
 |
|
History
Received September 2, 2005; published May 16, 2006
Citation
Rapela, J., Mendel, J. M., & Grzywacz, N. M. (2006). Estimating nonlinear receptive fields from natural images.
Journal of Vision, 6(4):11, 441-474,
http://journalofvision.org/6/4/11/,
doi:10.1167/6.4.11.
Keywords
natural images, nonlinear models, Volterra model, dimensionality reduction, Projection Pursuit Regression
for related articles by these authors
for papers that cite this paper |
Nonlinearities are ubiquitous in the response of visual cells. Just in the primary visual cortex, we find rectification (Albrecht & De Valois, 1981; Movshon, Thompson, & Tolhurst, 1978), saturation (Dean, 1981; Maffei & Fiorentini, 1973), expansion (Albrecht & Hamilton, 1982; Sclar & Freeman, 1982), phase advance (Carandini & Heeger, 1994; Dean & Tolhurst, 1986), and cross-orientation inhibition (Bonds, 1989; Morrone, Burr, & Maffei, 1982), to mention a few. Therefore, several investigators looked for general methods to study these nonlinearities from the retina to the cortex. In theory, white noise could be an ideal stimulus to estimate nonlinear properties. This is because if a nonlinear system is stimulated with a white noise stimulus ensemble for a long enough time, then there is a finite probability that any given stimulus will appear, probing the nonlinear system thoroughly and efficiently. However, this rationale is weak for the visual system because the possible dimensions of stimuli space are too large (i.e., the number of possible images is immense). Moreover, data are noisy and limited in quantity in physiological recordings. Hence, white noise does not provide enough signal-to-noise ratio to obtain accurate estimates of responses along all stimulus dimensions. Finally, some sensory neurons do not respond well to white noise. Thus, it would be better to use natural images to concentrate the power of the stimuli on the normal operating range of the cell. More than 40 years ago, Barlow ( 1961) introduced the hypothesis that visual cells are optimized to process natural stimuli. Since then, several investigators have provided support to this natural-adaptation hypothesis (Dan, Attick, & Reid, 1996; Simoncelli & Olshausen, 2001; Srinivasan, Laughlin, & Dubs, 1982) and have shown that natural images emphasize features of responses not prominent when using synthetic stimuli (David, Vinje, & Gallant, 2004). However, only recently, a number of studies have begun to use nonlinear models and natural stimuli to characterize the response of visual cells. These studies obtained interesting findings, but the methods had limitations for investigators interested in general receptive fields. Some groups did not attempt to map receptive fields (Aggelopoulos, Franco, & Rolls, 2005; Dan et al., 1996; Guo, Robertson, Mahmoodi, & Young, 2005; Vinje & Gallant, 2000, 2002; Weliky, Fiser, Hunt, & Wagner, 2003), whereas others limited their studies to linear (Ringach, Hawken, & Shapley, 2002; Smyth, Willmore, Baker, Thompson, & Tolhurst, 2003; Theunissen et al., 2001; Willmore & Smyth, 2003) or second-order, nonlinear (Felsen, Touryan, Han, & Dan, 2005; Touryan, Felsen, & Dan, 2005) receptive fields. More general nonlinearities were first studied by modeling cell responses as a linear filter followed by a point nonlinearity (Chichilnisky, 2001; Nykamp & Ringach, 2002) or by fitting a priori models (David et al., 2004). The former models cannot capture nonlinear interactions between subregions of the receptive field and, thus, might not be sufficiently general to characterize responses of large classes of cells. Later, two methods that make no assumptions on the response-generation mechanism were introduced (Prenger, Wu, David, & Gallant, 2004; Sharpee, Rust, & Bialek, 2004; Sharpee et al., 2006). These methods are powerful but require delicate nonlinear optimizations that could be overwhelming to many investigators. In an attempt to overcome some of the above limitations, we explored the Volterra model (Marmarelis, 2004). Because it does not rely on specific assumptions about the response-generation mechanism, this model can be applied to a broad class of cells. Moreover, the Volterra model has a long history in the study of nonlinear physiological systems. This model could be especially useful for visual cells because its first-order kernels are just like standard receptive fields. Therefore, the model can generalize them to the nonlinear domain. The large number of parameters in Volterra models of visual cells and limited physiological recordings have hindered its application to the visual system. In this article, we overcome this limitation using a recent substantiated hypothesis stating that the responses of each visual cell depend on an especially low-dimensional subspace of the image space (Bialek & de Ruyter van Steveninck, 2005; Brenner, Bialek, & de Ruyter van Steveninck, 2000; de Ruyter van Steveninck & Bialek, 1988; Rust, Schwartz, Movshon, & Simoncelli, 2005; Sharpee et al., 2004; Sharpee et al., 2006). Accordingly, we introduce the Volterra relevant-space technique to estimate Volterra models of visual cells from natural images. Most laboratories characterize the response of visual cells as a nonlinear function on a one- or two-dimensional subspace of the image space (Chichilnisky, 2001; Felsen et al., 2005; Sharpee et al., 2004; Sharpee et al., 2006; Simoncelli, Paninski, Pillow, & Schwartz, 2004; Touryan, Lau, & Dan, 2002; Touryan et al., 2005). To estimate this nonlinear function, they construct one- or two-dimensional histograms of the response as a function of the projection of the input images based on this subspace. Then, they fit parametric functions to this histogram. Using simulated data from cortical simple cells and simulated and physiological data from cortical complex cells, we compare the Volterra model with histogram-based techniques for simple (Chichilnisky, 2001) and complex cells (Touryan et al., 2005). Several methods, in different scientific disciplines, have been proposed to estimate the low-dimensional subspace (de Boer & Kuyper, 1968; de Ruyter van Steveninck & Bialek, 1988; Friedman & Stuetzle, 1981; Helland, 1988; Li, 1991; Sharpee et al., 2004; Touryan et al., 2005). In this article, we test projection pursuit regression (PPR; Friedman & Stuetzle, 1981), a nonlinear regression algorithm. We compare PPR with two popular models used in vision: spike-triggered average (STA; de Boer & Kuyper, 1968) and the modification of spike-triggered covariance (STC) for natural images (Touryan et al., 2005). We organize the rest of this article as follows: The Theory section describes the theoretical underpinnings of the modeling of cells' receptive fields. This section covers the Volterra model and the Volterra relevant-space technique. This theoretical section is followed by the Methods section, the Results section, and the Discussion section. Each theoretical material starts with an intuitive description of ideas so that readers can get the gist of the paper even if they skip its mathematical content. Then, those sections present the main mathematical results, leaving proofs and developments to appendices. A useful way to think of the Volterra model is that it is a sequence of approximations. They are given by a sequence of filters, called kernels, with which to process the input. When Volterra models are applied to visual cells, the zeroth-order kernel represents the activity when the input is absent. The first-order Volterra kernel is the linear component of the response. For example, in the spatial domain, this kernel gives the response to small pulses of input at every position. Therefore, this kernel is a good representation of what people traditionally call the receptive field of a cell. The second-order kernel represents the weight that nonlinear interactions between two inputs (e.g., at two times or in two positions) have on the response. In general, the rth-order kernel represents the weight that nonlinear interactions between r inputs have on the response. The development of Volterra models relies on the mathematical notion of the Volterra series introduced by Volterra ( 1930). In the time domain, the output y( t) of a stationary, stable, causal system can be expressed in terms of its input signal x( t) through its Volterra series expansion as  | (1) |
A Volterra model is a truncated Volterra series; that is,  | (2) |
where the subscript Q in yQ,k indicates that the Volterra series has been truncated at order Q, the subscript k reflects the dependence of the Volterra model on the particular choice of kernels { k0,k1,…,kQ}, and  stands for possible errors due to truncation and to noise in the data. The Volterra model given in Equation 2 uses one-dimensional continuous inputs. However, the input to visual cells is multidimensional (one dimension of time, two dimensions of space, three dimensions of color, etc.), and discrete data are normally used to characterize their responses. Here, as a first approximation, we only use the spatial dimensions of the stimuli to estimate the response of visual cells, disregarding temporal interactions. As discussed in the Discussion section, the Volterra relevant-space technique can be extended to consider spatiotemporal interactions. The Volterra series used henceforth is  | (3) |
where x N × N is the input image. Any Volterra model can be represented with kernels kr that are symmetric with respect to permutations in pairs of indices ( ip,jp). Hence, the number of parameters in Volterra kernels is less than their number of kernel coefficients. For example, for k2, we have the symmetry k2( i1,j1,i2,j2) = k2( i2,j2,i1,j1), which implies that the number of parameters is  , whereas the number of kernel coefficients is N4. The number of parameters of the Volterra model in Equation 3, nV, is  | (4) |
For example, a fourth-order Volterra model, Q = 4, using images of size 16 × 16 as input, contains nV = 186,043,585 parameters. From the limited data that can be recorded in physiological experiments, we cannot estimate such a large number of parameters. The Volterra relevant-space technique described in the next section addresses this problem. Volterra relevant-space technique Fortunately, one can reduce the dimensionality of Volterra models. This is because the response of each visual cell depends on a low-dimensional subspace of the image space. One reason for this low dimensionality is the high degree of correlation in natural images (Field, 1987; Ruderman & Bialek, 1994; Srinivasan et al., 1982). Another reason is that each neuron responds to particular features of the image, such as an edge at a given position with a given orientation. Here, we will show how to use this low-dimensional subspace to build a low-dimensional Volterra model and we will prove the equivalence between the original and low-dimensional Volterra models. Thus, we establish, for the first time, a mathematical link between the low-dimensional subspace and Volterra models. The construction of the low-dimensional Volterra model begins by projecting the input images onto a low-dimensional subspace  . If x is an image and  = { b1,…, bL: bl N × N} is a set of orthonormal basis functions spanning the space  , then the projected image is  | (5) |
where  | (6) |
is the projection coefficient of image x onto the basis function bl. Using these coefficients, the low-dimensional Volterra model is  | (7) |
What are the conditions on the space  for the low-dimensional Volterra model to be an equivalent representation of the original Volterra model in Equation 3? The answer is given by Proposition 1 ( Appendix A). It states that if a vector space is such that the response of the cell to any image equals the response to the projection of the image onto , then the original Volterra model can be rewritten as the low-dimensional Volterra model. A vector space  satisfying this condition is called a relevant space. That is,  is a relevant space if and only if  | (8) |
Any set of orthonormal basis function  = { b1,…,bL: bl N × N} spanning a relevant space is called a set of relevant dimensions. We are not seeking a relevant space such that the represented images look similar to the original ones. Instead, we are seeking a relevant space such that the response to the original images equals the response to the represented images. This equality is shown in Equation 8. Therefore, for our purposes, a relevant-space representation does not depend only on the statistics of natural images, as in the case of image reconstruction. This representation depends also on the properties of the cell under study. How much do we gain by using the relevant space to represent Volterra models? The number of parameters in the low-dimensional Volterra model, nLV, is  | (9) |
As an example, for L = 10 relevant dimensions, a fourth-order ( Q = 4) low-dimensional Volterra model will contain nLV = 1,001 parameters. In contrast, the number of parameters in the original Volterra model ( nV) for a 16 × 16 image is 186,043,585 ( Equation 4). Therefore, if a small relevant subspace can be identified, the Volterra relevant-space technique achieves enormous dimensionality reduction. How are the coefficients, qi, of the low-dimensional Volterra model ( Equation 7), estimated? We seek coefficients that minimize the difference between true responses and those predicted by the Volterra model. To obtain a convenient method for this minimization, we begin by expressing Equation 7 as the linear Volterra equation  | (10) |
where, for K image–response pairs and M parameters, y K is the response vector whose ith element, y( xi), is the response of the cell to the ith image, A K × M is the data matrix, q M is the vector of parameters, and  accounts for possible errors due to truncation and for noise in the data. For instance, for a second-order cell ( Q = 2), the ith row of matrix A, ai′, is  | (11) |
and the parameters vector, q, is  | (12) |
By using the mean square error ( MSE),  | (13) |
as the goodness-of-fit criterion, we can apply any linear technique to estimate the parameters vector, q. The use of linear techniques in the estimation of Volterra models removes the whiteness requirement of cross-correlation techniques. Therefore, we can use natural images to estimate Volterra models, without the need to whiten them or to modify image statistics in any way. Having estimated the coefficients of the low-dimensional Volterra model, one approximates the true kernels, ki, in Equation 3, with
 | (14) |
 | (15) |
 | (16) |
Proposition 2 ( Appendix A) provides justification for these equations. This proposition proves that, if a cell can be represented by a Volterra model using true kernels, ki, then, for any set of true relevant dimensions { bi} ( Equation 8), there exist coefficients q0,…,qQ, such that a Volterra model using the true kernels, ki, generates the same responses as a Volterra model using the projected kernels in Equations 14– 16. Thus, from a predictive perspective, these reconstructed kernels are indistinguishable from the true kernels. The use of Equations 7– 16 assumes a known order of the Volterra model, Q. Here, we selected the order Q by cross-validation, as indicated in the Volterra relevant-space technique section under the Receptive-field estimation section. In summary, the Volterra relevant-space technique uses the following procedure to estimate Volterra models from natural images: - Estimate the relevant space.
- Project the images in the estimated relevant dimensions (Equation 6), obtaining the coefficients αl.
- Use the coefficients αl to construct the data matrix (Equation 11) of the linear Volterra equation (Equation 10).
- Estimate the coefficients vector, q, of the linear Volterra equation by minimizing the MSE in Equation 13.
- Reconstruct the Volterra kernels using Equations 14–16.
Code implementing this procedure along with the simulated data used in the Results section can be downloaded from http://vpl.usc.edu/projects/nonlinearReceptiveFields/. After the relevant dimensions have been estimated, the remaining steps in the procedure (Steps 2–5) involve only linear operations and are therefore fast and easy to implement. In contrast, most implementations of histogram-based methods fit a nonlinear function to an estimated histogram. The selection of the optimal histogram bin size is an open question, and the nonlinear fit is slower and comparatively more difficult to implement than the linear operations required by the Volterra relevant-space technique. Responses of the simulated simple-cell and complex-cell models described below were generated using 10 × 10 image patches extracted from calibrated natural images (van Hateren & van der Schaaf, 1998). Sample natural images are shown in Figure 1. Figure 1. Sample natural images. The simple-cell data were generated by a model similar to those previously used to describe V1 simple cells (Jones, Stepnoski, & Palmer, 1987). The model consisted of a linear spatial filter followed by a sigmoidal rectification, rect( v) = saturation / [1 + exp(−slope × ( v − threshold))], and a Poisson spike generator. We used a sigmoidal rectification instead of a halfway rectification because we wanted to use a model that was differentiable at every point. This differentiability allowed derivation of true Volterra kernels (see Appendix B). The simple-cell model is then given by  | (17) |
where  .,.  represents the Euclidean inner-product operation and f is the spatial filter. For the spatial filter, we used a two-dimensional Gabor function. It had a preferred orientation of 45 deg from vertical, a preferred spatial frequency of 2 cycles per receptive field, a spatial bandwidth of 1.6 octaves, and an even phase. For the sigmoidal rectification of the filtered image, we used slope = 5.0 and threshold = 1.0 and varied the saturation parameter (setting it to 458, 276, 92, and 46) to obtain different mean number of spikes per image (5, 3, 1, and 0.5, respectively). Because the responses of the simulated simple cell followed a Poisson distribution, the signal-to-noise ratio in the responses increased monotonically as a function of the mean number of spikes per image. To assess the quality of the estimated relevant dimensions, we first derived analytically relevant dimensions for the simple cell. We then computed their projection onto the estimated relevant space. If the estimated relevant space embodies the true relevant space, then the analytically relevant dimensions should equal their projection onto the estimated space. In deriving the analytically relevant dimensions, we begin by observing that the first stage of the simple-cell model is a linear filter. Hence, the response to any image will equal (up to the noise level) the response to the image projected in the span of the linear filter. Consequently, the linear filter spans the relevant space. After rescaling this filter to unit norm, it is the only relevant dimension for the simulated simple cell. The derivatives of all orders of the simple-cell model in Equation 17 are nonzero. Hence, according to Equation B3, every kernel in Equation 3 is nonzero. This may cause a problem when the projections of the input images onto the filter f (the coefficients αl( x) in Equation 7) are large. In this case, truncation errors will become significant. However, in the opposite case, that is, when these projections are small, the problem is not serious. By using a model with kernels of sufficiently high order, these truncation errors will be negligible. To minimize the effect of truncation errors, we selected the amplitude of the filter f in such a way that the absolute value of the largest filtered image was 1. The complex-cell data were generated by means of a spatial energy model (Adelson & Bergen, 1985). The input image was first filtered using two Gabor filters, in quadrature phase, that is, with a 90 deg phase relationship. These filters had a vertical preferred orientation, a preferred spatial frequency of 2 cycles per receptive field, and a spatial-frequency bandwidth of 1.6 octaves. The outputs of these filters were then squared and summed, generating the mean of a Poisson spike generator. The analytical expression of the complex-cell model is then  | (18) |
where f1 and f2 are the linear filters in quadrature phase. To vary the mean number of spikes in the responses, we scaled the magnitude of the Gabor filters, generating responses with 5.0, 3.0, 1.0, and 0.5 spikes/image. As before, to assess the quality of the estimated relevant space, we derived analytically relevant dimensions for the complex-cell model. We then compared them with their projection onto the estimated relevant space. Due to the linear filters in the first stage of the complex-cell model, the response to any image will equal, up to the noise, the response to the image projected in the span of the filters. Consequently, the two linear filters span the relevant space. Because these filters are orthogonal, after rescaling them to unit norm, they constitute a set of relevant dimensions. Again, to judge the quality of the estimated kernels, we compared them with the true kernels. We derived them for the complex-cell model using the procedure described in Appendix B. Only the second spatial derivatives with respect to x of Equation 18, evaluated at zero, are different from zero. Hence, from Equation B3, only the second-order Volterra kernels of the complex cell are different from zero. Therefore, the complex cell can be represented with a second-order Volterra model. The physiological data used here were recorded in the laboratory of Dr. Yang Dan, and the method used to obtain them were described by Touryan et al. ( 2005). Animal procedures to obtain these data were approved by the Animal Care and Use Committee at the University of California, Berkeley. Briefly, single-cell recordings were made in Area 17 of adult cats (2–6.5 kg) using tungsten electrodes. Cells were classified as simple if the receptive fields had clear ON and OFF subregions and if the ratio of the first harmonic to the DC component of the responses to an optimally oriented drifting sinusoidal grating was greater than 1. Natural images used in cell stimulation were selected at random from a database consisting of a variety of digitized movies (van Hateren & Ruderman, 1998), and the center patch (12 × 12 pixels) of each image was retained. Images were scaled so that they all had the same variance. These selected images had no temporal correlations and were presented at a frame rate of 24 Hz in an area slightly larger than the classical receptive field of the cell estimated by hand mapping. For analysis, Touryan et al. binned responses at 41.8 ms, the presentation frame rate, and the bin immediately following a frame was used as the response to that frame. They shared with us two to four repetitions of responses from four complex cells to 24,000 natural images. Below, we show the analysis of the cell for which we obtained the best predictions with both the Volterra and Touryan et al. models. For this cell, both models yielded better predictions when considering the spikes in the frame presentation bin as the cell response to that frame. We used the average response across four repetitions of the stimuli as the response variable to estimate the different models. To control for possible overfitting of the different receptive-field estimation models to the data used in their estimation, we partitioned the data into disjoint training and test data sets. Only the training data set was used to estimate model parameters. The test data set was reserved to evaluate the models' predictive power. By using different training and test data sets, the prediction results reflected the generalization ability of the different models to predict responses to novel natural images. In addition, between 2 and 10 jackknifed data sets were generated by excluding different 10% segments of the training data set (Efron & Tibshirani, 1993). These jackknifed data sets were used to average out the noise from the estimated relevant dimensions and model parameters and to select model hyperparameters. We used training sets of varying size (1,000, 3,000, 5,000, 10,000, and 15,000 image–response pairs). For the simulated and physiological data, we used test sets of size 4,500 and 4,000 image–response pairs, respectively. Receptive-field estimation Volterra relevant-space technique Estimating the relevant dimensions. To develop low-dimensional Volterra models, one must estimate a set of relevant dimensions satisfying Equation 8. Methods for doing this rely on different assumptions about the statistics of the input, the properties of the response-generation function, or both. The methods of reverse correlation (de Boer & Kuyper, 1968), STC (de Ruyter van Steveninck & Bialek, 1988), and maximally informative dimensions (Sharpee et al., 2004) have been introduced in the field of neuroscience. These methods are powerful but have constraints about the dimensionality of the relevant space and the type of stimulus used or require an arduous high-dimensional nonlinear optimization that constrains the dimensionality of estimable relevant spaces. (An exception may be the modification of STC for natural images explained in the STC for natural images section.) Other interesting methods involve perception neural networks (Hertz, Palmer, & Krogh, 1991) and partial least squares (Geladi & Kowalski, 1986; Helland, 1988). The former suffers from the curse of dimensionality, and the latter relies on a linear assumption about the response-generation function. Other methods are sliced inverse regression and PPR, both of which were introduced in statistics. Sliced inverse regression (Li, 1991) makes no assumptions about the response-generation mechanism but relies on the assumption of a spherical distribution of the stimuli. In turn, PPR (Friedman & Stuetzle, 1981) makes no assumption about the distribution of the inputs and uses the following generalized linear model for the response-generation mechanism:  | (19) |
Provided M0 is sufficiently large, any smooth function can be represented by Equation 19 (universal approximator property; Diaconis & Shahshahani, 1984). Another interesting property of PPR is that its estimation algorithm explicitly addresses the curse of dimensionality by separately estimating each term in the generalized linear model. In the following experiments, we evaluated PPR for the estimation of the relevant space. In Appendix C, we outline the PPR algorithm (see Friedman, 1984, for further details). The estimated relevant dimensions for the low-dimensional Volterra model are computed by orthonormalizing the set of projection directions, { a1,…, aM0}, in Equation 19. To understand why this procedure works, let  denote the space spanned by the projection directions,  = a1,…, aM0 . Let us also denote  as the projection operator onto the space  . Because ai, x = ai,  ( x)  ,  | (20) |
That y( x) = y(  ( x)) implies that  is a relevant space for y. Consequently, a set of relevant dimensions can be computed by orthonormalizing the set of projection directions { a1, …, aM0}, as indicated above. For the results reported here, we used the implementation of the PPR algorithm in R, an environment for statistical computing (Venables & Smith, 2002). This implementation selects initial values for the projection directions, ai, in Equation 19 by cross-correlating the input images with the residuals of the responses. Consequently, it might fail to estimate the relevant dimensions of cells for which the responses are completely uncorrelated with the input images. For these cases, one can use the implementation of Automatic Smoothing Spline Projection Pursuit (Roosen & Hastie, 1994) developed by Roosen ( 1994), which allows the specification of initial projection directions. However, for all the conditions reported below, there was enough correlation between images and responses for the R implementation of PPR to estimate good relevant dimensions. The estimation of the projection directions requires two runs of the PPR algorithm. The first run estimates the number, L, of projection directions needed to characterize the cell. Then, the second run estimates the L projection directions. In the first run, for each jackknifed data set in our training set ( Data partitioning section), we ran the PPR algorithm with parameters M0 = 1 and M = 6. These parameters allowed assessing the predictive error of PPR models with one to six projection directions (see Appendix C). For each jackknifed data set, PPR generates a predictive-error-versus-number-of-projection-directions curve. The PPR predictive-error curves of different jackknifed data sets were averaged. This average generated a new curve that was used to select the number of projection directions. PPR predictive-error-versus-number-of-projection-directions curves follow a common pattern. Before a critical number of terms, the predictive error decreases substantially as we increase the number of terms. Increasing the number of terms beyond the critical value yields only small reductions in predictive error. We selected the number of projection directions as the largest number, between one and six, for which the error bars (Size 10 standard errors) did not intersect those of the previous number of projection directions. After choosing the number of PPR projection directions, L, we proceeded to estimate the projection directions. We ran the PPR algorithm, with parameters M0 = L and M = 6, for the N (2 ≤ N ≤ 10) jackknifed data sets, estimating N sets of L noisy projection directions. To attenuate the effect of noise, we averaged these sets of projection directions, as described in the Averages of sets of relevant dimensions section. Note that PPR uses raw natural images as inputs, in contrast to spike-triggered techniques, which use spike-triggered images, that is, images weighted in proportion to their corresponding response. Estimating the low-dimensional Volterra model. Having estimated the relevant dimensions, we constructed the low-dimensional Volterra model ( Equation 7). We estimated its parameters, qi, by minimizing the MSE between the responses of the cell model and the predictions of the low-dimensional Volterra model. As noted in the Volterra relevant-space technique section, the solution to Equation 13 is a linear problem. We solved it using the pseudoinverse computed from the singular value decomposition (SVD; Mendel, 1995). Selecting the order of Volterra model. For each training jackknife data set ( Data partitioning section), we estimated low-dimensional Volterra models of different orders. We then evaluated the predictive power of these models using the 10% portion of the training data not included in the jackknife data set. The predictive power was the cross-correlation between cell responses and Volterra model predictions. We then generated a curve of predictive power versus model order. Curves of different data sets were averaged. The least order maximizing the average predictive power was selected as the order of the Volterra model. Suppose the responses of a cell are generated by a static nonlinearity on the projection of input images along a single relevant dimension. In other words, suppose y( x) = N( h, x ) +  where y( x) is the response of the cell to image x, h is a linear filter,  .,.  is the inner-product operation, and N is a static nonlinearity. When the images are Gaussian white noise, the filter h can be estimated from pairs of images–responses by cross-correlating the inputs with the responses (Marmarelis & Marmarelis, 1978). However, natural images are not white noise. For them, when the cell can be represented as a liner device, the cross-correlation between the stimuli and the responses equals the product of the autocorrelation of the inputs and the filter, i.e., Cyx = Cxxh. Then, for nonwhite inputs and a linear cell, the filter h can be recovered as h = C
Cyx. The autocorrelation matrix, Cxx, for natural stimuli is nearly singular. Therefore, its true inverse tends to amplify noise. To avoid this problem, we regularized the autocorrelation matrix using the truncated SVD (Hansen, 1987) and computed the pseudoinverse (Ben-Israel & Greville, 1980) from this regularized matrix. The computation of the truncated SVD uses a threshold to decide how many singular values to include in the regularized matrix. For each condition in the studies reported below, we selected the optimal threshold by using k-fold cross-validation (Efron & Tibshirani, 1993). We used the jackknifed data sets from the training data ( Data partitioning section) to estimate different STA filters. To minimize the effect of noise, we used the average of these filters as the final STA estimate. Histogram-based model for simple cells The histogram-based model for simple cells characterizes the response to an image x as a linear filter, h, followed by a static one-dimensional nonlinearity (Chichilnisky, 2001). Precisely, y( x) = N( h, x ) +  . We estimated the linear filter using STA and the static nonlinearity by fitting the parameters of the same sigmoidal function used to generate the simulated data to a histogram of responses as a function of projections of input images in the estimated filter. We used histograms with equi-spaced bins and selected the bin size from the data range using the Sturges rule (Sturges, 1926). For Gaussian white noise stimuli, consider stimuli that elicit spikes (spike-triggering stimuli). The variance of the projections of these stimuli along the cell's relevant dimensions should be higher or lower than the variance along nonrelevant dimensions. The dimensions with high or low variance correspond to the eigenvectors of the autocovariance matrix with high or low eigenvalues, respectively. For Gaussian stimuli, STC (de Ruyter van Steveninck & Bialek, 1988; Simoncelli et al., 2004) identifies these “extreme” eigenvectors as the relevant dimensions of a cell. Touryan et al. ( 2005) have proposed a modification of STC for natural stimuli. This modification starts by whitening the natural stimuli. Denote by U the matrix of eigenvectors of the autocovariance of the stimuli (one eigenvector per column), and denote by λi its eigenvalues. The matrix of normalized eigenvectors is defined as  | (21) |
Then, the whitened natural images are Xw = XUn. STC for natural images now performs a classical STC analysis on the whitened natural images, estimating a set of relevant dimensions, Vw (one relevant dimension per column). Finally, the estimated relevant dimensions of the cell, V (one relevant dimension per column), are V = UnVw. The autocovariance matrix of natural images is nearly singular; hence, its last eigenvalues ( λi, i  1) will be very small and tend to amplify noise. To avoid this effect, we selected a threshold and set eigenvalues below this threshold to 1. In this way, only the eigenvectors corresponding to large eigenvalues are normalized in Equation 21. With physiological and simulated data, we used different threshold values for different cells and selected the value maximizing the predictive power. Best predictions were obtained using thresholds that normalized around 35% of the eigenvectors. Thus, for the results reported below, we used this criterion to select the threshold. The same criterion was used by Touryan et al. ( 2005). Multiple sets of STC relevant dimensions were estimated using the jackknifed data sets from the training data ( Data partitioning section). We then averaged these sets ( Averages of sets of relevant dimensions section) to obtain the final STC estimates. Histogram-based model for complex cells To reconstruct the nonlinear function from the relevant space by a histogram method, we use the technique proposed by Touryan et al. ( 2005). We construct two independent histograms for responses as a function of projections of images in each of the two estimated relevant dimensions. We then fit a power function, f( p) = β| p| γ + r0 to each histogram, where f( p) is the response of the cell to a given image, p is the projection of the image onto a estimated relevant dimension, and β, γ, and r0 are free parameters. The reconstructed nonlinearity is then the sum of the power functions fitted to each histogram, y( x) = f1(  1, x ) + f2(  2, x ), where y( x) is the response of the cell to image x, f1 and f2 are the power functions fitted to the histograms, 1 and 2 are the first and second relevant dimensions, and  .,.  is the inner-product operation. Averages of sets of relevant dimensions We wish to average N sets of L estimated relevant dimensions, when each set has been estimated from a different portion of the training data. All sets describe, with noise, a common relevant space. The goal is to estimate this common space. Because different sets might represent the common space in different coordinate systems, we cannot average them element by element. To perform the average, we instead collected the N sets into a matrix A. In A, columns {( i − 1) L,…, iL} correspond to the relevant dimensions of the ith set. The column space of A spans an L-dimensional relevant space (signal) and a noise space. We use the SVD to separate the column space of A into a signal and an orthogonal noise space (Scharf, 1991). Specifically, we compute the SVD of A, A = UΣUT, and use the first L left singular vectors (first L columns of matrix U) to represent the signal space. Thus, the first L columns of matrix U are the average of the N sets of L relevant dimensions. To display kernels and to measure the goodness of fit of estimated kernels, we scaled them in proportion to their mean contribution to the response of the cell. The first-order kernel was scaled by the mean absolute value of the stimuli. In turn, the second-order kernel was scaled by the mean absolute value of the autocorrelation. Precisely, by denoting the mean of z by  , the scaled first- and second-order kernels are
k ( i, j) = k1( i, j)  and k ( i1, j1, i2, j2) = k2( i1, j1, i2, j2)  , respectively. In this article, we represent first- and second-order spatial Volterra kernel slices in three-dimensional (3D) spaces. We display 3D spaces as perspective and contour plots. First-order kernels, k1( i,j), have two independent variables: the dimensions of space and one dependent variable, the amplitude. The latter is represented on the vertical axis of the 3D plots, with i and j along the X- and Y-axes, respectively. Second-order kernels, k2( i1, j1, i2, j2), have four independent variables and one dependent variable. We display second-order kernel slices with respect to different reference positions. That is, we fix the reference position ( i1, j1) and plot the value k2( i1, j1, i2, j2) on the vertical axis, with i2 and j2 on the X- and Y-axes, respectively. These plots show contributions to the response of interactions between individual pixels of the image and pixel x( i1, j1). The kernels shown in the Results section have been scaled ( Scaling kernels section). This allows the reader to appreciate visually the average contribution of a kernel to the cell response and to compare the contributions from different kernels. For predicting responses, on average, larger kernels are more relevant than smaller ones. Goodness of fit of kernels To measure the goodness of fit of estimated kernels to true kernels, we scaled them ( Scaling kernels section), and computed their MSE. The MSE is a scale-sensitive measure. This implies that if true and estimated kernels have small amplitude, then the MSE between them will be small, independent of how well they correlate. For measuring errors between scaled kernels, this scale sensitivity is a good property. This is because scaled kernels with small amplitudes contribute little to the response, and thus it does not matter how well the estimated kernels correlate with the true kernel. Moreover, the MSE allows the comparison of errors across different kernels. This happens because the amplitude of scaled kernels indicates their mean contribution to the response. Therefore, a similar MSE for different scaled kernels indicates that they contribute similar errors to the response. Volterra terms' contributions To assess the importance of individual Volterra terms to the response, we decomposed Equation 3 as a sum of terms of increasing nonlinear order,  | (22) |
where  | (23) |
 | (24) |
 | (25) |
We wanted to know how relevant was each term relative to the others for the generation of the response. For this purpose, we defined the contribution of the ith term to the response y( x) to image x as contrib i( x) = | yi( x)|  | yj( x)|. The definition of contribution is such that if for a given image the contribution of a specific term is large, then, if we set to zero the component of the response due to that term, the response is largely modified. On the other hand, if the contribution of a term is small, then the response is mostly unaffected when we set to zero the component of the response due to that term. We initially evaluated the Volterra relevant-space technique with models of cortical simple and complex cells. The use of simulated data allowed us to derive analytically true relevant dimensions and the true Volterra kernels of the cell models. We could then compare these true parameters with those estimated from input–output data. The outcome of these comparisons appears in the True parameters sections under the Simulated simple cell section and the Simulated complex cell section. Next, we compared the Volterra model with two other well-known models that also use relevant dimensions. For the simulated simple cell, the comparison was with a model using STA to recover a single relevant dimension and histograms to estimate nonlinearity (Chichilnisky, 2001). For the simulated complex cell, the comparison was with a model using STC to recover the relevant space and histograms to estimate the nonlinearity (Touryan et al., 2005). In the PPR versus STA section and the PPR versus STC section, we present the results of comparing PPR with STA and STC, respectively. In turn, the Volterra model versus histogram-based model sections under the Simulated simple cell section and the Simulated complex cell section compare the Volterra model with the histogram-based models for the simulated simple and complex cells, respectively. Finally, we use physiological data from cortical complex cells to show the feasibility of our methods with real data ( Cortical complex cell section). We used PPR to estimate the relevant space of the simulated simple cell. For this estimation, we used a training set of 5,000 patches from natural images. Figure 2a shows the curve of predictive error versus the number of estimated relevant dimensions for the simple cell. Using two estimated relevant dimensions leads to a large reduction in prediction error compared with using only one dimension. In contrast, using more than two estimated relevant dimensions yield only small reductions in predictive error. (Here, we define large as the lack of intersection between neighboring error bars—Size 10 standard errors.) Based on these data, we used the projection directions of a two-term PPR model to compute the estimated relevant dimensions for the Volterra model (Volterra relevant-space technique section under the Receptive-field estimation section). Figure 2. Goodness of fit of PPR models as a function of number of estimated relevant dimensions. (a) Simulated simple cell. (b) Simulated complex cell. (c) Cortical complex cell. The size of error bars is 10 standard errors. Based on these data, we use two, two, and three estimated relevant dimensions for the Volterra model of the simulated simple cell, simulated complex cell, and cortical complex cell, respectively. Although we estimated two significant relevant dimensions from the data, as described in the Simulated simple cell section, the relevant space of the simple-cell model is spanned by a single relevant dimension. What matters for our method is that this space is part of the larger, estimated one. If this holds, then the estimation of the Volterra model, by minimization of Equation 13, can prune nonrelevant estimated space. (This pruning could be done, e.g., by setting appropriate coefficients to zero.) We must thus ensure that the true relevant dimension is similar to its projection onto the estimated relevant space. The left column of Figure 3 shows the analytically relevant dimension of the simple-cell model. The right column shows the projection of this dimension onto the estimated relevant space. Despite the effects of noise, the projection of the true relevant dimension onto the estimated relevant space captures well the preferred orientation, spatial frequency, and phase of the analytical one. Figure 3. Simulated simple cell: relevant dimensions. (a, c) Analytical relevant dimension. (b, d) Projection of the analytical relevant dimension onto the estimated relevant space. (a, b) Perspective plot. (c, d) Contour plot. The analytical relevant dimension is accurately approximated by its projection onto the relevant space, r2 = .94. To measure the goodness of fit of the true relevant dimension and its projection onto the estimated relevant space, we plot the coefficients of the true relevant dimension against the coefficients of its projection. We then fit a linear model to this plot and use the coefficient of determination (or r2 statistic) of the fitted linear model as the goodness-of-fit measure. The r2 statistic is the ratio between the reduction in variance achieved by the regression model (the total variance of the outputs minus the variance of the residuals) and the total variance of the outputs. Without a relationship between the true relevant dimension and its projection onto the estimated relevant space, the r2 statistic would be 0. In contrast, if the true relevant dimension were identical to its projection, then the r2 statistic would be 1. For the simple cell, the r2 statistic for the analytically relevant dimension and its projection onto the estimated relevant space was .94. Having estimated the relevant dimensions, we next estimated Volterra models of different orders using the same set of 5,000 image/response pairs used to estimate the relevant dimensions. We selected the order of the Volterra model as the least order maximizing the average predictive power (Volterra relevant-space technique section under the Receptive-field estimation section). Figure 4a shows the mean predictive power of simple-cell Volterra models as a function of their order. Second- and third-order Volterra models lead to large improvements in cross-correlation (here, large is defined as the lack of intersection between neighboring error bars—Size 2 standard errors), illustrating the relevance of nonlinear contributions to characterize the simple-cell data. Volterra models of order higher than three do not lead to improvements in predictive power. Based on these data, we selected a third-order Volterra model. Figure 4. Correlation coefficients between cell responses and the Volterra model predictions versus the order of the Volterra model. (a) Simulated simple cell. (b) Simulated complex cell. (c) Cortical complex cell. The size of error bars is two standard errors. The order of the Volterra model was selected as the least order maximizing the average predictive power. We selected a third, second, and second order for the simulated simple cell, simulated complex cell, and cortical complex cell, respectively. How important are high-order nonlinearities in the description of responses? To answer this question, we defined a simple measure of the relative importance of each Volterra term to the overall response ( Volterra terms' contributions section). We expected that the relative contribution to the response of different Volterra terms should be dependent on the response range. Higher responses might be more nonlinear and thus would require higher order Volterra terms. Figure 5 shows the mean relative contributions of different Volterra terms to the response of the simple cell. We show the mean contributions for groups of small (first quartile), middle (second and third quartiles), and large (fourth quartile) responses. For small responses ( Figure 5a), the most important terms are the low-order ones. In contrast, for medium ( Figure 5b) and large ( Figure 5c) responses, the situation reverses and high-order terms become dominant. Therefore, even for a cell often considered linear as the simple cell, high-order nonlinearities may matter, especially when describing responses to stimuli causing large responses. In the natural world, most responses may be small, as contrasts tend to be low (Balboa & Grzywacz, 2000; Ruderman & Bialek, 1994; Tadmor & Tolhurst, 2000; Zhu & Mumford, 1997), but some responses will be high, as the distribution of natural intensities is kurtotic (Field, 1987). Figure 5. Simulated simple cell: mean relative contributions of Volterra terms to the response. Contributions have been averaged over different response ranges: (a) small responses (first quartile), (b) medium responses (second and third quartiles), and (c) large responses (fourth quartile). As the responses increase, high-order Volterra terms become more important. To assess the quality of the estimated kernels, we compared them with the true kernels derived from the analytical expression of the simple-cell model ( Appendix B). The left column in Figure 6 shows the first-order true kernel, whereas its estimation appears in the right column. Similarly, Figure 7 shows the second-order kernel slice with respect to Position (6, 6); this was the slice with largest error. The kernels have been scaled to reflect their mean contribution to the cell response ( Displaying kernels section). Figure 6. Simulated simple cell: first-order kernels. (a, c) True; (b, d) Estimated. (a, b) Perspective plot. (c, d) Contour plot. Despite the noise, the estimated kernel accurately approximates its true value, MSE = 2.42E−03. Figure 7. Simulated simple cell: second-order kernel slices with respect to Position (6, 6). (a, c) True; (b, d) Estimated. (a, b) Perspective plot. (c, d) Contour plot. Blue arrows in the contour plots indicate the reference position. Despite the noise, the estimated kernel accurately approximates its true value, MSE = 1.83E−03. To measure the goodness of fit of estimated kernels to true kernels, we scaled them, in proportion to their mean contribution to the response, and computed their MSE ( Goodness of fit of kernels section). For the simple cell, the MSE between scaled true and estimated kernels was 2.42E−03 for the first-order kernel ( Figure 6) and 1.83E−03 for the second-order kernel slice with respect to Position (6, 6) ( Figure 7). This slice had the largest MSE among all second-order kernel slices (the MSE averaged across all second-order kernel slices was 1.74E−04). Therefore, despite the noise, the estimated kernels accurately approximate their true values. Although, for the simple cell, the first-order kernel is similar to the second-order kernel slice with respect to Position (6, 6), their interpretation is different. The first-order kernel indicates that the natural image would elicit a response if it were bright in the yellow region and dark in the red region. In turn, the second-order kernel slice indicates that simultaneous bright stimuli anywhere in the yellow region and at Position (6, 6) would facilitate the response. However, having bright stimuli in the red regions and at Position (6, 6) at the same time would inhibit the response. In other words, first-order kernels describe how much a stimulus at a given position linearly contributes to the response. Second-order kernels describe nonlinear modulations of responses, through interactions between stimulus pairs at given positions. So far, we showed that PPR is a good technique to estimate relevant dimensions that will be of use in the construction of Volterra models of simulated simple cells. Here, we compare PPR with STA, which is a technique often used to estimate the relevant dimension of simple cells. In particular, we compared the convergence properties of both algorithms parametric on number of inputs and signal-to-noise ratio. Figure 8a plots the r2 statistic for the regression between the coefficients of the true and estimated relevant dimensions as a function of the number of inputs used in the estimation. For this plot, the average response of the simulated simple cell was 5 spikes/image. Figure 8b plots the same r2 statistic as a function of the number of spikes per image in the simulated responses, when using 5,000 inputs in the estimation. These figures show that the two PPR relevant dimensions better approximate the true relevant dimension of the simple-cell model than the relevant dimension estimated by STA (black vs. green curve). The superiority of PPR over STA is not due to the fact that PPR is using two relevant dimensions to approximate the true relevant dimension, although STA is using only one. PPR also outperforms STA when using one relevant dimension (red vs. green curves). Figure 8. Simulated simple cell: comparison between PPR and STA. The figures show the r2 statistic for the regression of the coefficients of the true and estimated relevant dimensions. This statistic appears as a function of (a) the number of inputs when the mean response of the simulated cell is 5 spikes/image and (b) the mean number of spikes in the responses when using 5,000 image–response pairs. Black curve: r2 statistic between the true relevant dimension and its projection onto a space spanned by two PPR relevant dimensions. Red curve: r2 statistic between the true relevant dimension and one PPR relevant dimension. Green curve: r2 statistic between the true and STA relevant dimensions. The size of the error bars is two standard errors. PPR performs better than STA either when using one (red vs. green curve) or two (black vs. green) relevant dimensions to approximate the true relevant dimension. Volterra model versus histogram-based model The True parameters section showed that Volterra models can represent the nonlinearities of simple cells well, starting from its estimated relevant dimension. We now compare the Volterra model with another popular technique to represent nonlinearities, namely, a histogram-based model ( Histogram-based model for simple cells section). To make our comparison independent from the method used to estimate the relevant space, we compare the predictive power of the Volterra model using the two PPR relevant dimensions, the Volterra model using the STA relevant dimension, the nonlinearity estimated from histograms using the first PPR relevant dimension, and the nonlinearity estimated from histograms using the STA relevant dimension. (We use only the first PPR relevant dimension with the histogram-based model because it is one dimensional. In other words, this model accepts only one relevant dimension as input.) The predictive power of these models as a function of the number of inputs (using a mean response of 5 spikes/image) is plotted in Figure 9a. A plot as a function of the number of spikes per image (using 5,000 images) appears in Figure 9b. Note that, as discussed in the Data partitioning section, the data used to evaluate the different models were different from those used to estimate their parameters. Thus, the predictive power measures are not inflated by overfitting effects. Figure 9. Simulated simple cell: predictions of the Volterra model and the nonlinearity estimated from histograms. This figure shows average correlation coefficients between simulated cell responses and model predictions. These coefficients appear as functions of (a) the number of inputs (using a mean response of 5 spikes/image) and (b) the number of spikes per image (using 5,000 image–response pairs). Black curve: Volterra model using the two PPR relevant dimensions. Red curve: Volterra model using the STA relevant dimension. Green curve: nonlinearity estimated from histograms using the STA relevant dimension. Blue curve: nonlinearity estimated from histograms using the first PPR relevant dimension. The size of the error bars is two standard errors. For sufficiently high number of inputs and signal-to-noise ratio, the Volterra model using the relevant dimensions estimated by PPR performs better than the nonlinearity estimated from histograms. When using the STA relevant dimension, both models perform equally. The figures show that, when using the relevant dimensions estimated by PPR, the Volterra model predicts better than the nonlinearity estimated from histograms. This superiority holds for moderate to high number of inputs, that is, more than 1,000 image–response pairs, and for moderate to high signal-to-noise ratios, that is, more than 1 spike/image. When using the STA relevant dimension, the Volterra model performs at the same level as the nonlinear function estimated from histograms. This is significant because the simulated data accurately match the assumptions of the STA histogram model. Moreover, the parametric nonlinear function fitted to the histogram in the reconstruction of the nonlinear function ( Histogram-based model for simple cells section) was the same sigmoidal function used to generate the simulated responses ( Simulated simple cell section). Therefore, it is important that the Volterra model, making no assumptions on the response generation mechanism, is at least as good as the nonlinear function estimated from histograms. As for simple cells, we also used PPR to estimate the relevant space of the simulated complex cell from 5,000 stimulus/response pairs. Figure 2b shows the mean error as a function of the estimated number of relevant dimensions. Two is the largest number of estimated relevant dimensions for which models predict significantly better than models using fewer dimensions. Consequently, we computed the relevant dimensions for the Volterra model from a two-term PPR model. The left columns of Figures 10 and 11 show the first and second analytically relevant dimensions of the complex-cell model. In turn, the right columns show their projection onto the space spanned by the two estimated relevant dimensions. The analytically relevant dimensions are accurately approximated by their projection onto the estimated relevant space (first relevant dimension, r2 = .96; second relevant dimension, r2 = .95). Figure 10. Simulated complex cell: first relevant dimension. (a, c) First analytical relevant dimension. (b, d) Projection of first analytical relevant dimension onto the estimated relevant space. (a, b) Perspective plot. (c, d) Contour plot. The analytical relevant dimension is well approximated by its projection onto the estimated relevant space, r2 = .96. Figure 11. Simulated complex cell: second relevant dimension. (a, c) Second analytical relevant dimension. (b, d) Projection of second analytical relevant dimension onto the estimated relevant space. (a, b) Perspective plot. (c, d) Contour plot. The analytical relevant dimension is well approximated by its projection onto the estimated relevant space, r2 = .95. With these relevant dimensions we estimated Volterra models of different orders, using the same set of 5,000 stimulus/response pairs as for the estimation of the relevant dimensions. We selected the order of the Volterra model as the least order maximizing the average predictive power (Volterra relevant-space technique section under the Receptive-field estimation section). Figure 4b shows the mean predictive power of complex-cell Volterra models as a function of their order. Linear models predict the responses poorly due to a model mismatch. A linear model cannot accurately predict a quadratic response. Second-order and higher order Volterra models perform equally well. Hence, we used a second-order Volterra model to characterize the complex-cell data. As for the simulated simple cell, we looked at the contributions of the various Volterra terms to the response. We again averaged the contributions across small (first quartile), medium (second and third quartiles), and large (fourth quartile) responses. These contributions are shown in Figure 12. The simulated complex cell is purely second order; thus, the second-order Volterra term should be the only one that contributes to the response. Accordingly, medium and large responses are dominated by the second-order term. However, the spontaneous, zeroth-order Volterra term has been spuriously fitted to a nonzero value. Its contribution is largest for small responses. Such spurious fits may occur because, at small responses, the Poisson distribution is highly kurtotic, admitting values that would be outliers in Gaussian distributions. Consequently, our least squares fit, which is sensitive to outliers, might not be sufficiently robust at small responses. Nevertheless, these plots indicate that one should focus on the second-order kernel to interpret the responses of the simulated complex cell. Figure 12. Simulated complex cell: mean relative contributions of Volterra terms to the response. Contributions have been averaged over different response ranges: (a) small responses (first quartile), (b) medium responses (second and third quartiles), and (c) large responses (fourth quartile). Medium and large responses of the purely quadratic complex-cell model are dominated by the second-order Volterra term. The contributions of the zeroth-order term reflect a spurious fit (see text). To assess the quality of the estimated kernels for the complex cell, we compared them with their true values. The left column in Figure 13 shows the first-order true kernel, whereas its estimation appears in the right column. Similarly, Figure 14 shows the second-order Volterra kernel slice with respect to Position (6, 6). The kernels have been rescaled to reflect their mean contributions to the cell response ( Displaying kernels section). Figure 13. Simulated complex cell: first-order kernels. (a, c) True; (b, d) Estimated. (a, b) Perspective plot. (c, d) Contour plot. The kernels have been scaled to reflect their mean contribution to the cell response. Although the estimated but not the true kernel has some structure, the amplitude is low, making the estimation not much different from zero, MSE = 5.6E−04. Figure 14. Simulated complex cell: second-order kernel slices with respect to Position (6, 6). (a, c) True; (b, d) Estimated. (a, b) Perspective plot. (c, d) Contour plot. Blue arrows in the contour plots indicate the reference position. Despite the noise, the estimated kernel accurately approximates its true values, MSE = 1.42E−03. To measure the goodness of fit of estimated kernels to true kernels, we scaled them, in proportion to their mean contribution to the response, and computed the MSE between them ( Goodness of fit of kernels section). The MSE between scaled true and estimated kernels was 5.6E−04 for the first-order kernel ( Figure 13) and 1.42E−03 for the second-order kernel slice with respect to Position (6, 6) ( Figure 14). This slice was the one with maximum MSE among all slices (the mean MSE across all slices was 3.06E−04). Thus, despite the noise, the estimated kernels accurately approximate their true values. Note that for the complex cell, the first-order kernel should be zero. The estimated kernel is not zero and has structure. This is because, in the linear Volterra equation ( Equation 10), small first-order coefficients, corresponding to a first-order Volterra term with small contribution ( Figure 12), will be dominated by large coefficients, corresponding to spontaneous and second-order Volterra terms with large contributions. Then, in the regression procedure, the small first-order coefficients will tend to fit noise. Subsequently, in the reconstruction of the estimated first-order kernel ( Equation 15), these noisy first-order coefficients willmultiply relevant dimensions with clear structure, generating first-order kernels with spurious structure. Consequently, we should not try to interpret kernels corresponding to terms whose contributions to the response are very small. The last section showed that PPR is a good technique to estimate relevant dimensions for use in the construction of Volterra models of simulated complex cells. Here, we compare PPR with the modification of STC for natural images proposed by Touryan et al. ( 2005), henceforth referred as STC for natural images. These authors mentioned that natural images are highly variable in their global contrast, that is, variance. To avoid possible contrast adaptation of the cells, they normalized the natural images to have equal variance. Our simulated complex cell did not include adaptation mechanisms; thus, we did not need to normalize the images to have equal variance. When we used STC for natural images with simulated responses to nonequal-variance images, we obtained poor results, but when we used responses to equal-variance images, we obtained satisfactory results. To alleviate this problem, we equalized the variance of the natural images as a preprocessing step before computing STC. However, in these situations, we did not modify the simulated responses. The variance-equalization step was only necessary for the estimation of STC for natural images. Having estimated the relevant dimensions, the original, nonequal-variance natural images were used to recover the nonlinearity using histograms. Figure 15a shows the average r2 statistic as a function of the number of images for several conditions involving PPR and STC. Figure 15b plots the average r2 statistic as a function of the average number of spikes in the responses. STC achieves its best performance with responses to natural images with equal variance (red curve), as in Touryan et al. ( 2005). When the responses correspond to nonequal-variance images, STC for natural images performs poorly (blue curve) and the variance-equalization preprocessing step significantly improves its performance (green curve). With nonequal-variance data, PPR outperforms STC for natural images for any number of inputs or signal-to-noise ratios (black vs. green and blue curves). However, |