| Volume 5, Number 5, Article 1, Pages 376-404 |
doi:10.1167/5.5.1 |
http://journalofvision.org/5/5/1/ |
ISSN 1534-7362 |
The effect of stimulus strength on the speed and accuracy of a perceptual decision
John Palmer |
Departments of Psychology, and Physiology & Biophysics, University of Washington, Seattle, WA, USA |
|
Alexander C. Huk |
Neurobiology & Center for Perceptual Systems, University of Texas, Austin, TX, USA |
|
Michael N. Shadlen |
Department of Physiology & Biophysics, Howard Hughes Medical Institute, National Primate Research Center, University of Washington, Seattle, WA, USA |
|
Abstract
Both the speed and the accuracy of a perceptual judgment depend on the strength of the sensory stimulation. When stimulus strength is high, accuracy is high and response time is fast; when stimulus strength is low, accuracy is low and response time is slow. Although the psychometric function is well established as a tool for analyzing the relationship between accuracy and stimulus strength, the corresponding chronometric function for the relationship between response time and stimulus strength has not received as much consideration. In this article, we describe a theory of perceptual decision making based on a diffusion model. In it, a decision is based on the additive accumulation of sensory evidence over time to a bound. Combined with simple scaling assumptions, the proportional-rate and power-rate diffusion models predict simple analytic expressions for both the chronometric and psychometric functions. In a series of psychophysical experiments, we show that this theory accounts for response time and accuracy as a function of both stimulus strength and speed-accuracy instructions. In particular, the results demonstrate a close coupling between response time and accuracy. The theory is also shown to subsume the predictions of Piéron’s Law, a power function dependence of response time on stimulus strength. The theory’s analytic chronometric function allows one to extend theories of accuracy to response time.
 |
|
History
Received August 11, 2004; published May 2, 2005
Citation
Palmer, J., Huk, A. C., & Shadlen, M. N. (2005). The effect of stimulus strength on the speed and accuracy of a perceptual decision.
Journal of Vision, 5(5):1, 376-404,
http://journalofvision.org/5/5/1/,
doi:10.1167/5.5.1.
Keywords
decision, response time, psychometric function, speed-accuracy tradeoff, temporal summation
for related articles by these authors
for papers that cite this paper |
Both response time and accuracy depend on the difficulty of a perceptual judgment. Increasing the stimulus strength or difference between stimuli decreases response time and increases accuracy. Measurements of accuracy as a function of the stimulus strength are known as psychometric functions and are central to the study of psychophysics (e.g., Klein, 2001). Measurements of response time as a function of stimulus strength are sometimes known as chronometric functions (e.g., Link, 1992). The goal of this study is to understand how these measurements are related to one another. We measure both functions and test the predictions of a low parameter version of the diffusion model (e.g., Ratcliff, 1978; Ratcliff & Smith, 2004). This theory predicts a close coupling between the effect of stimulus strength on response time and its effect on accuracy.
A theory of how stimulus strength affects response time and accuracy requires assumptions about the encoding of the stimulus and how this internal representation is used in decision making. In short, it needs assumptions about scaling and decision. To set the stage, consider theories of scaling and decision intended for accuracy experiments. The modern starting point is signal detection theory (Green & Swets, 1966; Macmillan & Creelman, 2005). In this theory, the stimulus is represented by a random variable and the decision is made by comparing a sample from this random variable to a criterion. The theory allows one to distinguish between sensitivity manipulations that affect the stimulus representation and bias manipulations that affect the decision criterion. Sensitivity is summarized by the d′ measure, which is the difference between noisy representations normalized by their standard deviations.
To relate a particular stimulus to sensitivity, one must assume something about the scaling of the stimulus into the internal representation. Assuming a simple proportional scale, d′ is linear with stimulus strength and the shape of the psychometric function follows from the distribution of the noisy representation (Tanner & Swets, 1954). The common assumption of Gaussian noise results in a psychometric function that is a cumulative Gaussian. This proportional scaling can be generalized by allowing d′ to be a power function of stimulus strength (Nachmias & Kocher, 1970; Pelli, 1987). For example, contrast discrimination of simple disks can be described by a proportional scale, whereas contrast detection requires a power function scale (Laming, 1986; Leshowitz, Taub, & Raab, 1968). In short, the form of the psychometric function depends on assumptions about both the scaling and decision.
Theories of response time
The starting point for modern theories of response time is sequential sampling theory (Stone 1960; Wald, 1947; for a review, see Luce, 1986). The internal representation of the relevant stimulus is assumed to be noisy and to vary over time. Each decision is based on repeated sampling of this representation and comparing some function of these samples to a criterion. For example, suppose samples of the noisy signal are taken at discrete times and are added together to represent the evidence accumulated over time. This accumulated evidence is compared to an upper and lower bound. Upon reaching one of these bounds, the appropriate response is initiated. If such a random walk model is modified by reducing the time steps and evidence increments to infinitesimals, then the model in continuous time is called a diffusion model (Ratcliff, 1978; Smith, 1990). For this model, the accumulated evidence has a Gaussian distribution, which makes it a natural generalization of the Gaussian version of signal detection theory (Ratcliff, 1980). A wider range of sequential sampling models are considered in the Discussion (e.g., Maloney & Wandell, 1984; Usher & McClelland, 2001).
Perhaps the most comprehensive analysis of chronometric functions was provided by Link ( 1992; Link & Heath, 1975; Smith, 1994). Link and colleagues focused on a very general version of sequential sampling theory called relative judgment theory coupled with a very general scaling assumption. This theory predicts a constraint on the relation between response time and accuracy (the “RT versus Z” relation). Such a constraint is also the center of the more specific models that are pursued in this article.
An alternative approach was taken by Ratcliff and colleagues, who investigated a diffusion model with parameter variability (e.g., Ratcliff, 1978; Ratcliff & Rouder, 1998). These studies focused on using this generalization of the diffusion model to account for differences between correct and error response times in a wide range of perceptual and memory tasks. Most relevant here, a few of their studies restricted how the parameters of the model depend on stimulus strength in perceptual discrimination. For example, Smith, Ratcliff, and Wolfgang ( 2004) used a three parameter Naka-Rushton function to describe the internal response to contrast.
While there are only a few theoretical studies of the effect of stimulus strength on response time, there are many empirical measurements. An early example of a chronometric function was described by Kellogg ( 1931). Many of the early measurements were performed under conditions with few errors (e.g., less than 5%). Under such conditions, response time is well described by a power function of stimulus strength with an additive constant (Piéron’s Law; for review, see Bonnet & Dresp, 2001; Luce, 1986). Comparisons with Piéron’s Law are considered in Experiment 3.
To sum up, prior approaches have pursued general versions of the diffusion model with general scaling assumptions. There is little or no prior work on the combination of the simplest diffusion model with the simplest scaling assumption.
To make this discussion of theory more concrete, we next introduce an example discrimination task that is used through much of this article. It is a left-right direction-of-motion discrimination task previously studied in humans (e.g., Morgan & Ward, 1980; Watamaniuk & Sekuler, 1992) and nonhuman primates (e.g., Newsome, Britten, Movshon, & Shadlen, 1989; Roitman & Shadlen, 2002). In Roitman and Shadlen ( 2002), rhesus monkeys were trained to view a dynamic random dot kinetogram and to decide the net direction of motion, indicating their decision by making a saccadic eye movement to a corresponding choice target ( Figure 1A). On each trial, some proportion of the dots moved coherently in one of two possible directions while the others were randomly repositioned. The monkey viewed the random dot display as long as required to make a decision. The display was terminated at the beginning of the eye movement response. From trial to trial, stimulus strength was varied by changing the proportion of coherently moving dots ( Figure 1B). In this way, both the proportion of correct responses and mean response time were measured as a function of stimulus strength.
Figure 1. Direction-discrimination task and random dot motion stimulus. A. Left-right, direction-of-motion discrimination task. On each trial, the observer fixates a central fixation point and then targets appear to the left and right. After an exponentially distributed random foreperiod, the random dot motion stimulus is presented. Observers view the stimulus until they make a response, indicating their judgment about the direction of motion by making a saccade to one of the targets. B. Example of a random-dot motion stimulus of variable motion coherence. Stimulus strength is varied by changing the proportion of dots moving coherently in a single direction.
As shown in Figure 2, both response time and accuracy varied with motion strength. Response times decreased from a highest value at zero motion strength toward a lower asymptote at the highest motion strengths. Accuracy increased from chance at zero motion strength to perfect at the highest motion strengths. The smooth curves in Figure 2 show the joint predictions of a diffusion model with proportional scaling assumptions. This model accounts for the effect of motion strength on both response time and accuracy. It is next described in detail.
Figure 2. Motion strength affects response times and accuracy. The top panel shows the mean response time for correct responses on a log scale, and the bottom panel shows the proportion of correct responses. Both graphs are a function of motion strength on a log scale. Error bars represent 1 SE in all figures. Smooth curves depict the predicted functions from the best-fitting proportional-rate diffusion model. Data are from Roitman and Shadlen ( 2002).
Proportional-rate diffusion model
We now introduce what we think is the simplest version of the diffusion model coupled with the simplest scaling assumptions. It is a special case of both relative judgment theory (Link, 1992) and the diffusion model with parameter variability (Ratcliff, 1978). Formal definitions and predictions are presented in the Appendix. Consider a discrimination between stimuli Sa and Sb, where one is required to make a corresponding response Ra or Rb. In the diffusion model, evidence is accumulated over time until an upper or lower bound is reached ( A or B), which triggers a response. A single trial is illustrated in Figure 3. It shows the relative evidence for stimulus Sa over stimulus Sb as a function of time. A sample path from a single trial is shown by the jagged contour. For this example trial, the accumulated evidence reaches the upper bound A and triggers the Ra response. The ray from the origin illustrates the mean drift rate μ. Ignoring the bounds, the path at time t would have a mean of μt and a variance of σ2t. The σ parameter is usually known as the diffusion coefficient. Weak stimuli have values of μ near zero and strong stimuli have large values. Bias toward one or the other response can be represented by the relative values of the bounds A and B. If there is no response bias, then B = A. In addition, the bound controls the speed-accuracy tradeoff. Large values of the bound slow the response and improve accuracy. In summary thus far, the model has parameters for the bounds ( A and B), the drift rate μ and the diffusion coefficient σ.
Figure 3. An illustration of a sample path of the accumulation of evidence underlying a perceptual decision. On each trial, evidence in favor of one alternative over another is accumulated as a function of time. For any particular stimulus strength, there is an accumulation of noisy evidence parameterized by the mean rate of accumulation. A decision is made when the process reaches one of the bounds.
Without further elaboration, the diffusion model has four parameters for each stimulus strength condition. We restrict it in several ways to develop a theory specific to the effect of stimulus strength. First, the drift rate is assumed to be the only parameter affected by stimulus strength. Second, no response bias is assumed, thus A = B. Third, the noise defined by the diffusion coefficient σ is assumed to be constant for all conditions.
To model response time, two further refinements are necessary to the diffusion process. First, the diffusion process is a model of decision time and not other sensory and motor latencies. The common approach is to consider decision time and the other residual times as additive independent contributions to the response time (Donders, 1969; Luce 1986). Thus, the mean response time tT is simply the sum of the mean decision time tD and the mean residual time tR. Second, the bound A, drift rate μ, and diffusion coefficient σ all share relative evidence units. Moreover, they combine in the predictions as ratios. This allows one to normalize the bound and drift parameters by the noise denoted by the diffusion coefficient σ (see Appendix for details). The normalization reduces the number of parameters and makes explicit the role of signal-to-noise ratio in the model. Thus, the parameters become the normalized drift rate μ′, the normalized bound A′, and the mean residual time tR.
To address stimulus scaling, one must assume something about the relation between stimulus strength and the drift rate. For the most specific model, we assume the normalized drift rate is proportional to stimulus strength x:μ′ = kx. The coefficient k is the measure of sensitivity in this proportional-rate diffusion model. We also consider a more general power function of stimulus strength in the power-rate diffusion model. In this generalization, the normalized drift rate is given by  . (The sign function is defined as 1 if x < 0 and 1 if x > 0.) In summary, the parameters are the normalized bound A′, sensitivity k, mean residual time tR, and optionally a scaling exponent β.
The proportional-rate diffusion model predicts that the psychometric function for accuracy PC(x) is a logistic function of stimulus strength x:
 |
(1) |
This logistic function is fit to the monkey accuracy data in the bottom panel of Figure 2. The predicted chronometric function for the mean response time is
. |
(2) |
Stimulus strength enters the function as both a 1/x term and as an argument for the hyperbolic tangent function. This function is fit to the monkey response times in the top panel of Figure 2. In fact, the fits shown are made simultaneously to response time and accuracy as described in Methods.
The coupling of response time and accuracy
One can use these functions to understand how the diffusion model predicts a coupled effect on response time and accuracy. In Figure 4, response time is a function of stimulus strength in the top panel, and proportion correct is a function of stimulus strength in the bottom panel as done with Figure 2. An example joint prediction is shown by the two curves. For accuracy, the predicted psychometric function spans the range of proportion correct from .5 to 1.0 and has only a single degree of freedom: a horizontal displacement on the logarithmic stimulus strength axis. The horizontal position can be summarized by the accuracy threshold halfway between chance and perfect. This halfway accuracy threshold depends on the bound and the sensitivity and is approximately equal to 0.55/(kA′ ). A derivation of this expression is provided in the Appendix. For response time, the chronometric function spans the range from a lowest value to a highest value. The lowest value is given by the mean residual time tR and the highest value is given by A′ 2+ tR. Within that range, the function has only one degree of freedom, a horizontal displacement. As with the psychometric function, the horizontal position of the function can be specified by a time threshold that is halfway between the extreme values. The halfway time threshold is approximately equal to 1.92/(kA′ ) and is also derived in the Appendix. In summary, for both functions the horizontal position is controlled by the same parameter kA′. Thus, the two functions must shift in unison.
Figure 4. An illustration of the relation between the chronometric and psychometric function. On a log-scaled stimulus strength axis, both functions have a fixed shape between upper and lower asymptotes. The sensitivity parameter shifts both functions horizontally in unison.
Although the use of the halfway criterion in the threshold definitions is arbitrary, the relative horizontal position of the psychometric and chronometric functions is not. For all parameter values, the two functions have a fixed offset from one another. For example, when predicted accuracy is at 75% correct, the predicted response time is 91% of the range from the lowest to the highest value. We summarize the predicted relative position of the two functions with the threshold ratio: halfway-time-threshold/halfway-accuracy-threshold.
The threshold ratio allows a specific test of the coupling between the chronometric and psychometric functions. The proportional-rate diffusion model predicts a threshold ratio of approximately 3.5, while other models predict other ratios. In the analysis of each experiment, we estimate the threshold ratio by uncoupling the sensitivity parameters for response time and accuracy (for more details, see Appendix ). If the observed threshold ratio is near 3.5, then it is consistent with the proportional-rate diffusion model; if the threshold ratio differs from 3.5, it is evidence against the proportional-rate diffusion model.
One can gain further intuitions about the predictions by examining how changing each parameter shifts the functions. Such shifts are shown separately for the three parameters in Figure 5. The left panels show the effect of the normalized bound A′. Increasing the bound increases the highest value of the chronometric function without changing the lowest value. It also shifts both functions to the left. Note this is not obvious for the chronometric function because of the simultaneous vertical scaling. Look for a common landmark on both functions: The halfway time threshold shifts in unison with the accuracy threshold. This is the effect of the speed-accuracy tradeoff: longer times yield higher accuracy. The middle panels show the effect of the sensitivity parameter k. Increasing k shifts both functions to the left. This is the effect of a pure sensitivity manipulation. The right panels show the effect of changing the mean residual time tR. Increasing the mean residual time shifts the chronometric function upward. On a linear response time graph, this is a simple displacement. The apparent shape change is due to the use of a logarithmic response time axis in this graph. The logarithmic scaling is used to simultaneously display effects at 300 and 2000 ms (e.g., Figure 7) and to make the standard errors more homogenous. The mean residual time has no effect on the psychometric function.
Figure 5. How parameters affect the chronometric and psychometric functions. A. Chronometric and psychometric functions for three values of the normalized bound A′. Increasing the bound increases the highest value of the chronometric function and decreases the halfway threshold for both functions. B. Chronometric and psychometric functions for three values of sensitivity k. Increasing sensitivity decreases the halfway threshold for both functions. C. Chronometric and psychometric functions for three values of the mean residual time tR. Increasing mean residual time displaces the chronometric function upward.
To summarize this analysis, the predicted chronometric and psychometric functions have simple analytic expressions. The expression for the chronometric function is nearly as simple as that of the more commonly studied psychometric function. Together, they depend on just three (or four) parameters and stimulus strength is assumed to affect only one parameter. This model predicts a close coupling between response time and accuracy. The relationship is summarized by the threshold ratio, which we use as our primary test of the coupling between response time and accuracy. In the following five experiments, chronometric and psychometric functions are measured under a variety of conditions.
The primary task used in this article has already been introduced with Figure 1. Human observers fixated the center of a display and were presented with a field of randomly moving dots. The motion of the dots was manipulated to have net coherent motion to either the left or the right. Observers judged the net direction of the motion and made a corresponding eye movement to a target to the left or right of fixation. Motion strength was varied and both probability correct and response time were measured as a function of motion strength.
Observers were young adults with normal or corrected-to-normal acuity. They were either volunteers from within the laboratory or were paid $15 per hour. Two of the authors (AH and JP) participated in some of the experiments. All had previous experience with psychophysical tasks.
The stimuli were displayed on a flat-screen CRT video monitor (19-in View Sonic PF790) controlled by a Macintosh G4 (533 MHz, Mac OS 9.1) with an ATI Rage 128 Pro graphics card (832 by 624 pixels, viewing distance = 60 cm, subtending 32° by 24° with 25.5 pixel/deg at screen center; refresh rate = 74.5 Hz). The monitor was adjusted to have a white with a CIE 1931 x, y chromaticity .31, .32, peak luminance of 110 cd/m 2, and a black level of 3.6 cd/m 2, of which 3.4 cd/m 2 was due to room illumination. In the first experiment, the first two observers (the authors) were in a dimly lit room with a resulting black-level luminance of 1.2 cd/m 2. For the remaining observers and for all other experiments, we switched to stronger room lights to reduce pupil size. This improved eye tracking on some observers. In all experiments, we presented white moving dots on a black background, and red or blue fixation and response targets (red: 27 cd/m 2, CIE 1931 x, y chromaticity .63, .34; blue: 9 cd/m 2, x, y chromaticity .15, .07). Stimuli were generated using the Psychophysics Toolbox Version 2.44 (Brainard, 1997; Pelli, 1997) for MATLAB (Version 5.2.1, Mathworks, MA). Observers were seated in an adjustable height chair in front of the display. Chin and forehead rests were adjusted so that each observer’s eyes were level with the middle of the monitor.
Eye movements were recorded using a noninvasive video system (EyeLink Version 2.04, SensoMotoric Instruments, Boston, MA) controlled by a separate computer (566-MHz Intel Pentium, running DOS version 7.0 installed from Windows 95). The EyeLink is a binocular, head-mounted, infrared video system with 250-Hz sampling. It was controlled by the EyeLink Toolbox extensions of MATLAB Version 1.2 (Cornelissen, Peters, & Palmer, 2002). We recorded and analyzed only the right eye position. As summarized in Table 1, the system has a resolution of 1° or better. For Experiment 1 with the largest sample of observers, the standard deviation of fixation was 0.71 ± 0.06° horizontal and 1.71 ± 0.22° vertical; after subtracting the variation in fixation, the standard deviation of the saccade endpoints to one of the targets was 0.31 ± 0.04° horizontal and 0.35 ± 0.03° vertical.
Table 1. Gaze precision and percentage rejected trials for all experiments. SD = standard deviation, Disc. = discrimination, and Det. = detection.
The motion display was a sequence of random dots that appeared within a 5° diameter circular aperture centered about fixation. Dots were 3 by 3 pixels (0.1° square), with a density of 16.7 dots/deg 2/s. On each trial, the direction of motion was randomly either left or right, and the strength of motion was selected randomly from a list of possible coherence values. The coherence specifies the probability that a dot is displaced in motion or randomly repositioned. On each video frame, a coherently moving dot was shifted 0.2° from its position 40 ms earlier (3 video frames), corresponding to a speed of 5°/s. A dot that was not moving coherently was plotted in a random position. We refer to the proportion of coherently moving dots as the motion coherence. Because the coherent and non-coherent dots are selected independently on each frame, this procedure effectively yields three interlaced sequences with limited lifetime dots. Similar random dot stimuli have been used in previous psychophysical and physiological studies because they control motion strength without providing positional cues (e.g., Newsome et al., 1989; Shadlen & Newsome, 2001).
Each trial consisted of the following sequence of events ( Figure 1A):
- (a)Fixation. A red fixation disk (0.4° diameter) was presented in the center of the screen. Once it appeared, observers were required to acquire and maintain fixation within a 5° radius for 0.35 s, and then within 2° of their initial fixation for another 0.15 s.
- (b) Targets. Once fixation was attained, two red target disks of 0.8° diameter were presented 10° to the left and right of fixation for a minimum of 0.2 s. This target display was maintained for an additional warning interval drawn from an exponential distribution with a mean of 0.7 s, truncated to a maximum of 4.8 s. Thus, the time from the onset of the targets to the onset of the motion display had a mean of 0.9 s and a maximum of 5.0 s. Observers had to maintain fixation within 2° of their observed fixation.
- (c) Motion display. After the targets were displayed and the waiting period was over, the fixation disk changed from red to blue and the random dot motion display was presented until the observer moved his/her gaze outside of a 5° radius window. The motion display was terminated immediately after fixation was broken. After another 0.1 s, gaze position was compared to the position of the targets to determine if the eye movement corresponded to a correct response, an error, or an anomalous response. If gaze did not fall within a 3° radius of one of the targets, the trial was classified as a no choice. If the gaze was within 3° of one of the targets, and remained near the target for another 0.2 s, the trial was classified as either correct or error, depending on the gaze position. If gaze did not remain near the targets for this 0.2-s period, the trial was also classified as a no choice. Both the fixation and targets were erased immediately after online response classification.
- (d) Feedback. Tone feedback was presented after the response or 1.0 s after the onset of the motion display, whichever came later. The tones and duration of the feedback varied depending on whether the response was correct (single tone, 0.5 s), an error (double tone, 1.0 s), or a broken fixation or other anomalous response (five tones, 2.0 s).
- (e) Intertrial interval. After the feedback period, the screen remained blank for an intertrial interval of 1.0 s before the fixation point was presented to begin the next trial.
Trials were presented in short blocks whose length varied with experiment. Motion strength and direction were counterbalanced within blocks. At the end of each block, observers were given three pieces of cumulative feedback: percentage correct, response time in the most difficult condition, and a summary of their performance in terms of the number of correct responses per minute. At the end of the experiment, the same feedback was given for the entire session. Several sessions of practice were conducted before beginning the reported experiments.
In our initial pilot studies, observers appeared to adopt a variety of strategies with respect to the speed of response. Some slowed down more than others to gain additional accuracy in the difficult conditions. To promote a common strategy across observers, we gave the following instructions: "Please respond as quickly as possible given a high level of accuracy. For difficult displays, you may take some time to improve your judgment. For this experiment, we will give you a target mean response time for the most difficult displays of about 800 ms." One exception was in the first experiment where JP and AH received no explicit instruction regarding speed.
The analysis of gaze position had two parts: real-time analyses performed during the experiment, and off-line analyses run on completed data sets. During each experiment, real-time analyses were used to terminate the stimulus when a response was made, to give appropriate feedback, and to abort trials with anomalous eye movements. After each experiment, a further off-line analysis was performed.
At the beginning of a session, three calibration sequences were conducted. They required the fixation of nine positions that spanned the display. Based on this calibration information, gaze position was calculated using the EyeLink’s nonlinear mapping function (Stampe, 1993). During each trial, the fixation position was estimated from the beginning of the fixation display until 0.2 s after the response. Saccades and blinks were detected using the algorithms supplied with the EyeLink. Saccades were detected when jointly satisfying position (0.1°), velocity (30 °/s), and acceleration criteria (8000°/s/s). Blinks were detected when the image of the pupil was lost for more than 12 ms. Eye position was sampled every 4 ms by the EyeLink. To reduce noise, the Eyelink’s digital filter was used with a time parameter of 1 sample (Stampe, 1993).
During the fixation and target displays, a trial was aborted if gaze position varied from the fixation point. This test was based on a circular window centered on the fixation position that was observed 0.35 s (averaged over 5 samples, 0.02 s) after gaze first moved within 5° radius of the fixation point. Subsequent gaze position had to remain within a 2° radius of the initial fixation gaze position. In addition, we made a coarse 5° radius test of fixation based on the initial calibration.
After the experiment, further analysis was performed on the sampled gaze position and the saccade events identified by the real-time analysis. For each trial, the offline analysis calculated the mean of the gaze position observed during fixation for the 0.1 s prior to the motion display, and used it to translate the gaze position relative to the calculated fixation.
Using the off-line position estimates and the real-time saccade and blink events, trials were classified into one of the following:
- (a) Bad fixation. Fixation at the beginning of the trial remained outside of a 5° radius circular window.
- (b) Blink. A blink was detected sometime from 0.1 s before the motion display to the termination of the display. If a blink was detected before the motion display, the trial was immediately aborted.
- (c) Anticipation. The gaze shifted to within 3° of one of the target locations before the beginning of the motion display.
- (d) Broken fixation. Gaze position deviated by more than 3° from the initial fixation before the stimulus presentation.
- (e) No choice. Gaze shifted outside of a 3° fixation window but not within 3° of either the left or right target after the onset of the motion display.
- (f) Error. Gaze shifted to within 3° of the incorrect target after the onset of the motion display within 0.1 s of leaving the fixation window.
- (g) Correct. Gaze shifted to within 3° of the correct target after the onset of the motion display within 0.1 s of leaving the fixation window.
Classification was done in the above order (a-g). Thus, a trial was classified as correct (g) only if it did not fit any of the criteria for the other classifications (a-f). For example, on a given trial, if an observer first made an anticipatory response and then the correct response, the trial was classified as an anticipation. In the analyses in this article, only correct and error trials were considered. All other anomalous trials were excluded from further analysis. We recorded and monitored the percentage of such trials to verify that these anomalous trials were rare. For Experiment 1 with the largest sample of observers, there were 2.7 ± 0.5% anomalous trials (± indicates a standard error throughout this article). Across all experiments, only 2% of trials were anomalous ( Table 1).
We fitted functions of stimulus strength to the mean response time and accuracy data using a maximum likelihood procedure. For each data set, the free parameters were iteratively adjusted to maximize the summed log likelihood of the predicted mean response time and accuracy. Likelihood is the probability of observing the data given the prediction. For response time, the relevant distribution is the sampling distribution of the mean, rather than the sample distribution for individual trials. The sampling distribution of the mean has a Gaussian distribution for asymptotically large samples. It can be described by the predicted mean response time tT(x) and predicted standard error of the mean  , where VAR is the predicted variance (see Appendix) and n is the number of trials. Given this Gaussian approximation, the likelihood LT of the observed mean response time  given the predicted mean response times tT(x) is
 |
(3) |
For accuracy, we assume the probability of observing r correct choices out of n trials obeys the binomial distribution. Thus, the likelihood LP of the observed proportions of correct responses r/n given the predicted proportion correct Pc is
 |
(4) |
The log likelihoods were summed over stimulus strength conditions to produce a combined log likelihood of
, |
(5) |
which was maximized by iteratively adjusting the model parameters.
For most analyses, we used the likelihood ratio test to ascertain whether adding parameters to the model significantly improved the fit. Let H0 denote a restricted model and let H1 denote a more general model with additional parameters. The statistic
 |
(6) |
is distributed as χ 2 with degrees of freedom equal to the difference in the number of free parameters between the two nested models (Hoel, Port, & Stone, 1971).
Experiment 1: Stimulus strength
Response time and accuracy were measured for a range of motion strengths in the direction-of-motion discrimination task. The results were fit to the predictions of the proportional-rate diffusion model and its generalizations.
Six observers, including the authors JP and AH, performed the motion discrimination task. On each trial, motion coherence was selected randomly from 0, 3.2, 6.4, 12.8, 25.6, and 51.2%.
Both response time and accuracy depended on motion strength. For six observers, Figure 6 shows the mean correct response time as a function of motion strength in the upper panels and accuracy as a function of motion strength in the lower panels. Data are shown as points with error bars representing 1 SEM for response times, and 1 SE of the proportion for accuracy. As motion strength increased, response time decreased and accuracy increased. Accuracy became nearly perfect for motion coherences greater than 25%, while response times continued to decline.
Figure 6. Experiment 1: Response time and accuracy as a function of motion strength. For six observers, each pair of panels shows mean response time for correct responses and proportion of correct responses as a function of motion strength on a log scale. Smooth curves depict the predicted functions from the best-fitting proportional-rate diffusion model.
The fit of the proportional-rate diffusion model is shown by the smooth functions of motion strength in Figure 6. The 11 data points were well described by the three-parameter model. The values of the parameters and the log likelihood of the fit are given in Table 2. Over the six observers, the mean sensitivity was 20 ± 3, the mean normalized bound was 0.71 ± 0.04, and the mean residual time was 347 ± 17 ms.
Table 2. Experiment 1: Parameter values for proportional-rate diffusion model. L = likelihood; tR in seconds.
To evaluate this model, we considered the uncoupled model in which the response time and accuracy functions are free to have their own sensitivity parameters and thus allow the threshold ratio to take on any value. For this uncoupled model, the mean estimated threshold ratio was 3.4 ± 0.2, which is similar to the 3.5 predicted by the diffusion. Thus, there is a close coupling of the functions at the predicted value.
Next we evaluated the proportional-rate diffusion model by estimating the exponent of the more general power-rate diffusion model. The mean estimated exponent was 1.1 ± 0.1, which was not reliably different than the value of 1.0 predicted by the proportional-rate model. Thus, proportional scaling of motion strength accounts for the results of this experiment.
Experiment 2: Speed-accuracy tradeoff
In Experiment 1, the proportional-rate diffusion model accounted for both response time and accuracy. In particular, it showed a specific coupling between these dependent variables. In Experiment 2, we measure the coupling while manipulating the speed-accuracy tradeoff. Observers can trade accuracy for speed depending on instructions or other task demands (e.g., Wickelgren, 1977; Ruthruff, 1996). The proportional-rate diffusion model predicts that the close coupling between response time and accuracy is consistent for any speed-accuracy tradeoff.
Two observers (JP and AH) performed a similar motion task as in Experiment 1. Motion coherence was varied over 7 log-spaced steps from 0.8% to 51.2%. Speed stress was manipulated by instructing observers at the beginning of a session to aim for a mean response time in the most difficult condition (lowest motion coherence) of either 0.5, 1.0, or 2.0 s. Observers received no instruction regarding mean response times for other conditions. Because observers without explicit instruction tend to produce response times of about 1 s at the lowest motion coherence, these three sets of instructions effectively introduced speed instructions for fast, intermediate, and slow response times, respectively.
At the end of each block of trials, observers received feedback about their mean response time in the hardest condition and their average accuracy for all conditions. There were five sessions at each of the three speed instruction levels; the order of the speed instruction levels was counterbalanced across sessions. Each session consisted of 6 blocks of 28 trials, for a total 168 trials per session. Overall this resulted in 120 trials per condition.
The results of Experiment 2 for the two observers are shown in Figure 7. For each observer, the three speed instruction conditions are shown by separate symbols, and the proportional-rate diffusion model is fit to each condition separately as shown by the solid curves. To a first approximation, the chronometric functions are vertically scaled copies of one another. The main effect of speed instruction was on the response time at low motion coherence. The observers were successful at matching their performance at low coherence to the instructed 0.5-, 1.0-, and 2.0-s response times. In addition, there was an effect on accuracy with longer instructed times yielding higher accuracy. In summary, the proportional-rate diffusion model accounts for the effect of stimulus strength across a variety of speed-accuracy tradeoffs.
Figure 7. Experiment 2: Response time and accuracy as a function of motion strength and speed instruction. Observers are shown in separate columns. Speed instructions had large effects on response time for low motion strengths and little effect for high motion strengths.
The parameters estimated from the fits to the proportional-rate diffusion model are shown in Figure 8 with a separate panel for each parameter. In the top panel, the normalized bound is shown as a function of the instructed speed. For both observers, the results show the expected large variation in A′ from near 0.5 with the fast speed instruction to 1.3 with the slow instruction. This change in the bound predicts a change in the decision time from 0.25 to 1.7 s, a seven-fold increase. The sensitivity and mean residual time parameter estimates are shown in the middle and bottom panels, respectively. There appears to be a small effect on both. The mean sensitivity decreases from 28 to 25. This may be due to less effective integration of information over the longer time periods. The mean residual time decreases from 313 to 296 ms. We discuss these small effects below.
Figure 8. Experiment 2: Effect of speed instruction on parameters of proportional-rate diffusion model. A. Speed instruction effects on bound. The bound increases with the increasing time of the speed instruction. B. Speed instruction effects on sensitivity. C. Speed instruction effects on mean residual time. The primary effect of the speed instruction is on the bound.
To evaluate the effect of the speed instruction more closely, we used the 4-parameter uncoupled model to estimate the threshold ratio for each speed-accuracy tradeoff condition. The estimated parameter values are shown in Table 3 for the three conditions and both observers. The grand average threshold ratio was 3.0 ± 0.3, which was below the value of 3.5 predicted by the diffusion model. By a likelihood ratio test, only two of the six conditions were reliably better fit by the uncoupled model at a p < .05 significance level. But see below an alternative account.
Table 3. Experiment 2: Parameter values for uncoupled model. Speed instructions and tR in seconds. The χ 2 is based on a likelihood ratio test comparing this uncoupled model to the proportional-rate diffusion model. * p<.05, ** p< .01.
To examine the effect of the speed instruction on stimulus scaling, we fit the 4-parameter power-rate diffusion model. The estimated parameters are shown in Table 4 for all conditions and both subjects. The grand average exponent was 1.2. While this was not much higher than 1.0, the 2-s speed instruction showed exponents of 1.4 and for both observers was a reliable improvement in the fit ( p < .001). Thus, while the scaling was nearly proportional, there was a reliable deviation for the longest instructed speed.
Table 4. Experiment 2: Parameter values for power-rate diffusion model. Speed instruction and tR in seconds. The χ 2 is based on a likelihood ratio test comparing this power-rate diffusion model to the proportional-rate diffusion model. * p < .05, ** p < .01, *** p<.001.
Further analysis yields three additional comments. First, if one starts with the model that includes the free exponent for stimulus scaling and then allows the threshold ratio to vary, there is no reliable improvement in the fit. For this analysis, the estimated threshold ratio is 3.6 ± 0.3. Thus, if one accepts the more general power function scale, the coupling between response time and accuracy is maintained across a range of speed-accuracy conditions.
Second, there are at least two possible accounts of the larger exponents estimated for the longest speed instruction condition. The obvious possibility is the direct dependence of stimulus scaling on the speed-accuracy tradeoff, but such an effect is unexpected. Alternately, the longest speed instruction results in much larger effects on response time. This makes it easier to estimate the exponent. Thus, it may be that there is a small deviation from proportionality in all conditions, but they are easier to detect in the longest speed instruction condition.
Third, in the initial 3-parameter analysis of this experiment, the residual time parameter varied with speed stress. Specifically, increasing the target time decreased the residual time estimate. For the more general 4-parameter exponent model, this pattern was reversed. In this case, increasing the target time increased the residual time estimate. Because there are no data on the lower asymptote of the function, the estimated mean residual time is very sensitive to the shape of the function. Thus, we suggest the deviation in the mean residual time may be due to imprecision in the shape of the function rather than the residual time per se.
In summary, we found that the interaction of stimulus strength and speed instructions can be accounted for by the power-rate diffusion model. There is a reliable small deviation in the exponent beyond proportionality but no reliable deviation for the threshold ratio expected for the power-rate diffusion model.
Experiment 3: Response time for high accuracy conditions
Perhaps the majority of response time experiments have been performed under conditions in which accuracy is perfect or near-perfect. Under such conditions, response time still improves with increasing stimulus strength. This effect is described by Piéron’s Law, which posits that mean response time tT(x) varies as a power function of the stimulus strength x with an additive constant time tR (e.g., Bonnet & Dresp, 2001; Mansfield, 1973; Pins & Bonnet, 1996; Schweickert, Dahn, & McGuigan, 1988),
. |
(7) |
This function has been used to describe the effect of stimulus strength in many suprathreshold tasks, including motion (Hohnsbein & Mateeff, 1992) and contrast (Burkhardt, Gottesman, & Keenan, 1987). In this experiment, we tested whether the proportional-rate diffusion model can also account for the response times measured with near-perfect accuracy.
Two observers (JP and MK) performed a left-right direction discrimination task on dynamic random dot fields of varying motion coherence (5 log-spaced steps: 10, 18, 32, 56, and 100% coherence). Observers received the following instructions: “This experiment includes relatively easy conditions. Please respond as quickly as possible while maintaining an accuracy of at least 95% correct (5% errors). For this experiment, one can emphasize speed at the expense of a few errors on harder trials.” The emphasis on speed and the allowance of a small proportion of errors were intended to deter observers from trying to avoid making any errors by spending especially long amounts of time on the hardest trials. Similar small error rates have been allowed in previous high accuracy studies. The task was otherwise identical to Experiment 1. Each observer participated in five sessions and each session consisted of 10 blocks of 20 trials. This yielded 200 trials per condition per observer.
As shown in Figure 9, response time decreased with increasing motion strength and accuracy was perfect or near perfect across all conditions. The response times were similar to that observed in previous experiments in corresponding conditions. For example, in Experiment 1 ( Figure 6), compare the accuracy of the 10% coherence condition (middle of the range) to the accuracy of the 10% condition in Experiment 3 (the lowest coherence). Although accuracy was high across all conditions, the ~5% errors observed in the hardest condition was similar to the accuracy observed in the corresponding condition of Experiment 1.
Figure 9. Experiment 3: The proportional-rate diffusion model can account for response times that obey Piéron’s Law. Smooth curves depict the predictions of the best-fitting proportional-rate diffusion model and dashed curves depict the predictions of Piéron’s Law.
The fit to Piéron’s law is shown by the dashed curve. For both observers, Piéron’s law is an excellent description of the results. The response time and accuracy data are also fit to the proportional-rate diffusion model, and the results are shown by the solid curves. For response time, this fit is virtually indistinguishable from that of Piéron’s law. Estimated parameter values are similar to those found in Experiment 1 (JP: A′ = 0.59 ± 0.01, k = 21±1, tR = 300 ± 1; MK: A′ = 0.71 ± 0.01, k = 22 ± 1, tR = 307 ± 2). Using the 4-parameter uncoupled model, the estimated threshold ratio was 3.2 and 3.8 for JP and MK, respectively. The mean threshold ratio was 3.5 ± 0.3. Using the 4-parameter power-rate model, the exponents were 1.05 and 0.92 for JP and MK, respectively. The mean exponent was 0.99 ± 0.07. Thus, despite the narrow range of accuracy data, both the threshold ratio and the exponent remained consistent with the proportional-rate diffusion model.
The proportional-rate diffusion model has two properties that are preferable to Piéron’s Law. First, Piéron’s Law does not make predictions about accuracy, whereas the diffusion model does. Second, the predictions of the two models diverge at lower stimulus strengths. Piéron’s Law predicts that response time approaches infinity as stimulus strength approaches zero (cf. Bonnet & Dresp, 2001). In contrast, the proportional-rate diffusion model predicts that response time approaches a ceiling as the stimulus strength approaches zero. Such a ceiling is seen in all of the experiments we have conducted and is common in the relevant prior studies (e.g., Pike, 1971). In conclusion, at least some results consistent with Piéron’s Law are also consistent with the predictions of the proportional-rate diffusion model.
Experiment 4: Generality of response modality
The next experiment investigated whether the proportional-rate diffusion model can account for data from other response modalities. Thus far, observers made saccadic eye movements to peripheral targets to indicate their decisions. To address the generality of the model, we performed an experiment in which observers made finger movements to press buttons.
Two observers (JP and EH) performed the left-right direction-of-motion discrimination on dynamic random dot fields of varying motion coherence (8 steps: 7 log-spaced steps from 0.8% to 51.2%, and 100%). The stimuli were otherwise identical to those used in Experiment 1. Observers indicated their decisions by either making an eye movement to a corresponding peripheral target (as in Experiments 1- 3) or by pressing a button with the corresponding left or right hand. On each session, observers performed one set of trials using one response modality, took a break (10-30 min), and then performed a second set of trials using the other response modality. The order was alternated across days. Gaze position was monitored during the button press experiment to ensure accurate fixation using the same criteria applied to the eye movement experiments. After two-to-four practice sessions, each observer participated in five sessions. Each session consisted of two parts, one with each response modality. Each part consisted of six blocks of 32 trials. There was a total of 120 trials per condition per observer.
As shown in Figure 10, the response time and accuracy data from the two response modalities were similar. For JP, eye movement responses were faster for all motion coherences. For EH, eye movement response times were faster than button press response times at the highest coherence and reversed elsewhere. The response time and accuracy were fit in Figure 10 by the proportional-rate diffusion model with all three parameters free to vary with response modality. The estimated parameters are in Table 5. The effect of response modality is specific to the residual time for JP but affects all three parameters for EH. Using the 4-parameter uncoupled model, the grand mean of the threshold ratio estimated over both conditions and both observers was 3.3 ± 0.3. Using the 4-parameter power ratio model, the grand mean of the exponents estimated over both conditions and both observers was 1.04 ± 0.03. Both values are consistent with the proportional-rate diffusion model. Thus, we replicate our previous results for both eye and finger movements.
Figure 10. Experiment 4: Response time and accuracy as a function of motion strength for eye and finger movements. Observers are shown in separate columns. The proportional-rate diffusion model fits both kinds of responses.
Table 5. Experiment 4: Parameter values for proportional-rate diffusion model. L = likelihood; tR in seconds.
Experiment 5: Generality of the stimulus and task
In the previous four experiments, the proportional-rate diffusion model has accounted for response time and accuracy in a direction-of-motion discrimination task. To test the generality of this model to other psychophysical tasks, we collected data from two additional tasks. The first task was a contrast discrimination in the presence of dynamic luminance noise, in which observers decided which of two noisy patches had a higher contrast (Eckstein & Whiting, 1996; Legge, Kersten, & Burgess, 1987). The second task was contrast detection in noise. In this detection task, both patches contained noise, but one patch also contained a noisy luminance increment. The contrast detection task was otherwise identical to the contrast discrimination task. We investigated this second task because contrast detection often yields psychometric functions that are steeper than those observed in discrimination (Leshowitz et al., 1968; Foley & Legge, 1981).
Two observers (JP and MK) performed a two-alternative contrast-increment discrimination or detection task in which they determined which of two peripheral disks had higher contrast. Specifically, the stimuli were 0.8° diameter disks centered 10° in the periphery that had higher luminance than the surround (57 cd/m 2). The contrast of one disk was fixed and the other was increased to vary stimulus strength. (Let the luminance of the disk be LD and the luminance of the surround be LS. Then contrast was defined by  . Each trial began with fixation of the central red fixation point. After fixation was achieved, two patches of dynamic random pixel Gaussian noise appeared to the left and right. The size and position were the same as the targets in the motion task. The noise was updated on each refresh (75 Hz). The appearance of these noise patches indicated the beginning of a warning period whose duration was a random value drawn from an exponential distribution, as in the previous motion experiments. At the end of this warning period, two disks appeared, one in the center of both of the peripheral noise patches. One of the disks had higher luminance contrast than the other. In the contrast discrimination task, the pedestal disk contrast was 15%, and the target patch contained an additional contrast increment that varied randomly from trial to trial across 8 values: 0, 1, 2, 4, 8, 16, 32, and 85%. In the contrast detection task, the pedestal disk contrast was zero and only the target disk had any contrast. The dynamic random pixel noise from the patches was overlaid on the target disks and their immediate surround. This external noise had a standard deviation of 50% contrast in all conditions.
Observers indicated their decision by making a saccadic eye movement to the peripheral disk with higher contrast. The analysis and classification of eye positions were performed in the same manner as for the previous experiments. Each observer participated in five sessions; each session consisted of six blocks of 32 trials for a grand total of 120 trials per condition per observer.
In the contrast discrimination task, both response time and accuracy depended on the magnitude of the contrast increment ( Figure 11). As the contrast increment increased, response time decreased and accuracy increased. These effects of stimulus strength were quite similar to those observed in the motion experiments. The fits of the 3-parameter proportional-rate diffusion model are shown in Figure 11 for the two observers. The model provides a good account of the observed pattern of response time and accuracy. For this discrimination task, we focused on the 4-parameter power-rate model to facilitate comparisons to the detection task where deviations from proportionality were expected. The best-fit parameters of the power-rate model are given in Table 6. The estimated exponents were 1.25 and 1.09 for JP and MK, respectively. Using a 5-parameter version of the uncoupled model, the threshold ratio was 2.7 and 2.9 for JP and MK, respectively. These were our largest deviations from the predicted 3.5.
Figure 11. Experiment 5: Discrimination: Response time and accuracy as a function of the contrast increment. The power-rate diffusion model fits this contrast discrimination task with an exponent of about 1.2.
Table 6. Experiment 5: Parameter values for power-rate diffusion model. L = likelihood; tR in seconds.
In the contrast detection task, response time and accuracy also varied systematically with the magnitude of contrast ( Figure 12). For this task, both the response time and accuracy functions exhibited a stronger dependence on contrast. To fit this effect, it was necessary to apply a diffusion model with a power function relation between stimulus strength and mean drift rate (likelihood ratio test for JP and MK: χ 2(1) = 213 and 45, p << .001 for both). The best-fit parameters for the 4-parameter power-rate model are shown in Table 6 along with the discrimination data. Using the 5-parameter uncoupled model, the estimated threshold ratios were 2.8 and 3.1 for JP and MK, respectively. Thus, both contrast discrimination and detection showed threshold ratios of about 3.0. One possible account for this deviation from the 3.5 predicted by the proportional-rate diffusion model is provided by adding parameter variability and is considered briefly in the General discussion.
Figure 12. Experiment 5: Detection: Response time and accuracy as a function of contrast in a detection task. For detection, the power-rate diffusion model fits the data with an exponent of about 2.0.
Our results are consistent with previous findings of a steeper psychometric function for detection than for discrimination (Leshowitz et al., 1968; Foley & Legge, 1981). Our results extend previous results by demonstrating a corresponding dependence for response time. Moreover, these results show a similar coupling (threshold ratio ~3.0) between response time and accuracy for detection and discrimination tasks, despite the different stimulus scaling.
We investigated the effect of stimulus strength on response time and accuracy using the proportional-rate diffusion model and its power function generalization. A single sensitivity parameter in the model was able to account for the effect of stimulus strength on both response time and accuracy. This success was repeated for different speed instructions, conditions with and without errors, two response modalities, and three different stimulus judgments. In addition, the effect of varying the speed-accuracy tradeoff was accounted for primarily by the bound parameter. Thus, the model accounts for both response time and accuracy and has distinct parameters for sensitivity and the speed-accuracy tradeoff.
In the first part of the Discussion, we address three phenomena of this study. The first two have been the focus of our analyses: coupling of response time and accuracy, and stimulus scaling. The third is temporal summation, which is central to the diffusion model and is addressed by Experiment 2.
Coupling of response time and accuracy
A principle goal of the sequential sampling models is to provide an account of response time and accuracy with a common mechanism. This is in contrast with a previous generation of work that was specialized for either accuracy (e.g., signal detection theory, Green & Swets, 1966) or response time (e.g., Piéron’s Law, Pins, & Bonnet, 1996). The prior literature has addressed the coupling of response time and accuracy in a variety of ways (e.g., Mansfield, 1973; Santee & Egeth, 1982; Palmer, 1998). For example, Mansfield ( 1973) estimated photopic and scotopic sensitivity functions using thresholds based purely on response time and found results consistent with standard estimates based on accuracy thresholds. This consistency supports a common mechanism for these effects.
One way the common mechanism hypothesis may fail is when manipulations affect the residual time and not the decision time. Under such conditions, one expects effects on time and not on accuracy. Another way is that under high accuracy conditions, performance may depend on different mechanisms than under low accuracy conditions. Thus, typical response time experiments with high accuracy may be based on different mechanisms than typical accuracy experiments with lower accuracy (e.g., Mordkoff & Egeth, 1993; Santee & Egeth, 1982). However, neither of these options are supported for the simple discriminations studied here.
In this study, we evaluated the coupling of response time and accuracy by measuring sensitivity with separate estimates based on response time or accuracy. The estimates were combined in a threshold ratio that remains constant if a common mechanism couples the dependent measures. The threshold ratios estimated in each of our five experiments are shown in Figure 13. For Experiment 1, the estimate is the mean and standard error over six observers. For the other experiments, it is the grand mean and standard error over two observers with multiple conditions. For all experiments, the threshold ratio ranges between 2.9 and 3.6, which is close to the 3.5 predicted by the proportional-rate and power-rate diffusion model. Only the contrast experiment shows a deviation from the prediction. Thus, the threshold ratio is reasonably consistent over variations in the speed-accuracy tradeoff, high and low accuracy conditions, and response modality.
Figure 13. Summary of threshold ratio estimates from all experiments.
Alternative models predict different threshold ratios. To take an extreme example, consider an independent sampling model with a high threshold assumption (e.g., Maloney & Wandell, 1984). This model is particularly interesting because it modifies two of the central assumptions of the diffusion model: The decision is based on momentary evidence instead of accumulating evidence, and an error response is triggered by guessing rather than by noise. This model is described in more detail in the Appendix. For such a model, the shapes of the psychometric and chronometric functions are congruent and the predicted threshold ratio is 1.0. Thus, it is easy to reject this extreme version of independent sampling based on the current experiments. More generally, this alternative model illustrates the potential of the threshold ratio to discriminate between models.
Scaling of stimulus strength
Theories of stimulus scaling in visual psychophysics have been dominated by some form of a power function of stimulus strength. For example, Nachmias and Kocher ( 1970) and Pelli ( 1987) added a power function to signal detection theory with Gaussian distributions to predict psychometric functions. Similarly, the Weibull function (Quick, 1974) contains a power function of the stimulus strength. Pelli ( 1987) compared the two models of the psychometric function and found they are very similar: In a two-alternative forced-choice experiment, the two exponents are approximately proportionally related with a Gaussian exponent of 1.0 equivalent to a Weibull exponent of about 1.2. In this article, power function scaling is incorporated into the power-rate diffusion model and fit to both response time and accuracy. The results were similar to previous measurements based on accuracy alone. For direction-of-motion discrimination, we find best-fitting power function exponents of 1.2, 1.2, and 1.0 for Experiments 1, 2, and 4, respectively. These values are in the range of exponents estimated using a Weibull function of accuracy data (Weibull exponent = 0.91.4, Gold & Shadlen, 2003; see also Britten, Shadlen, Newsome, & Movshon, 1992). For contrast discrimination, we find a mean exponent of 1.15 on the change in contrast. This value is similar to the value of the exponent of Gaussian functions fit to accuracy data (exponent = 1.05, Leshowitz et al., 1968). For contrast detection, we find a mean exponent of 2. This value is also similar to the value of exponents reported in earlier experiments (Gaussian exponent = 2.0, Leshowitz et al., 1968; Weibull exponent = 3.0, Foley & Legge, 1981). In summary, for the three cases studied, the exponents measured in response time experiments are similar to those measured in accuracy experiments.
We propose that the power-rate diffusion model can be used to scale stimuli in the same way as traditional models of the psychometric function. This diffusion model extends the previous work on accuracy psychometric functions by encompassing both response time and accuracy and thus a larger range of stimulus strength. In addition, this kind of model can be a departure point for more elaborate theories. For example, Link ( 1992) describes a random walk model based on Poisson differences that yields Weber’s Law.
Diffusion models assume perfect integration of noisy information over time. This is one of the main assumptions that distinguishes the model from its alternatives. Experiment 2 provides a measure of temporal summation because manipulating the speed instruction varies the amount of time that information can accumulate from the stimulus. One approach to measuring temporal summation is to estimate the accuracy threshold as a function of the duration of the stimulus (e.g., Barlow, 1958). For such an experiment, given perfect integration and no noise, one expects Bloch’s law up to some critical duration. In other words, the threshold is inversely proportional with duration. In the presence of independent noise over time, the predicted decline in threshold is inversely proportional to the square root of time (e.g., Smith, 1998; Watamaniuk, 1993). One also expects some decline simply for independent sampling over time (probability summation, Watson, 1979).
To measure improvements of accuracy over time, we estimated, for each speed instruction condition, both the accuracy threshold and the mean decision time for that threshold stimulus. The mean decision time is the predicted mean response time for the motion coherence corresponding to 75% correct, minus the estimated residual time. Figure 14 shows the log scaled accuracy threshold versus the corresponding log scaled mean decision time. For both observers, all points on the log-log plot fall near the line representing a power function with an exponent of around 0.46. This decline is slightly less than the predicted 0.5. Thus the accuracy and response times in this task are consistent with an almost perfect additive accumulation of sensory evidence.
Figure 14. Relationship between accuracy threshold and estimated decision time in Experiment 2. Accuracy threshold decreases with the square root of the estimated decision time at accuracy threshold. The diagonal line has a log-log slope of 1, which follows Bloch’s law.
There have been several direct measurements of motion coherence accuracy thresholds as function of duration. In an early study, Watamaniuk and Sekuler ( 1992) estimated critical durations around 0.5 to 1.0 s, and a more detailed follow up study (Watamaniuk, 1993) found threshold-versus-duration slopes of about 0.5 over a range from 300 to 1500 ms. Burr and Santoro ( 2001) measured durations from 0.1 to 10 s for several motion tasks and found threshold-versus-duration slopes between 0.5 and 1.0 and a critical duration of 1.0 s or more. Gold and Shadlen ( 2003) measured durations from 100 to about 700 ms and found slopes near 0.5 and no sign of a critical duration in this range. These critical durations stand in contrast with the much shorter values measured for motion detection without noise (Simpson, 1994; for a similar comparison in contrast, see Barlow, 1958, versus Eckstein, 1994).
There are several possible modifications to the diffusion model that introduce less than perfect integration (Smith, 1998). One is to introduce a leak in the diffusion process (Busemeyer & Townsend, 1993; Usher & McClelland, 2001). Another is to add linear filters as a front end to the decision process that attenuate sustained signals (Smith, 1995). Such a linear filter can also account for steeper threshold-versus-duration slopes often found with very short durations. In principle, these options can be distinguished by measuring the extent to which the threshold-versus-duration slope and the critical duration vary with stimulus conditions. We consider these alternative models and others in the final section of the Discussion. In conclusion, the results of Experiment 2 are consistent with near-perfect integration that has some leak or other imperfection.
While the version of the diffusion model considered here works well in many respects, it has been criticized along two lines. The first is that it predicts equal response times for correct and error responses. This point is discussed in detail below. The second is the predicted shape of the response time distributions. For example, the diffusion model predicts flat hazard functions while peaked hazard functions are often observed (e.g., Ratcliff, van Zandt, & McKoon, 1999). We do not pursue this second issue in this article because of the limited data for individual conditions collected in these experiments (100 trials per condition). Because of our focus on the effect of stimulus strength, we chose to measure many conditions rather than the response time distribution for individual conditions.
The symmetric diffusion model predicts identical mean response times for correct and error responses (Laming, 1968). Early tests of the diffusion model and related random walk models had focused on failures of this prediction (for a review, see Luce, 1986). In response, generalizations of the symmetric diffusion model have been developed that can predict either fast or slow errors (e.g., Link, 1975; Ratcliff & Rouder, 1998). Alternatively, Green, Smith, and von Gierke ( 1983) have argued that details of the experimental procedure might have inflated the difference between correct and error response times. In their experiment, observers received extensive training (performing over 20,000 trials) and performed a task in which the onset of the stimulus was not predictable due to the use of an exponentially distributed warning foreperiod. They found relatively small differences between correct and error response times.
In our experiments, mean response times for errors were slower than those for correct responses. The top panel of Figure 15 shows a scatterplot of mean error response time versus mean correct response time. Each data point corresponds to the mean correct and error response times from a single motion coherence from Experiment 1. Data are shown from the two lowest nonzero motion coherences, as there are very few errors at higher motion coherences. Most of the data points fall above the identity line demonstrating that error responses are slower than correct responses. The overall mean error response time is 847 ms compared to a mean correct response time of 769 ms. The mean difference over observers is not reliable in this particular experiment (77 ± 42 ms) but has been found to be reliable in similar experiments.
Figure 15. Comparison of error and correct response time. Mean error response time is plotted against mean correct response time. A. Results for six observers from Experiment 1. Each point corresponds to the error and correct mean response times for a given motion strength in a particular observer. Data from the two lowest nonzero motion strengths are shown. The diagonal line indicates equal response times. B. Similar plot for observer JP from the other experiments.
The bottom panel of Figure 15 shows the correct versus error response time for observer JP in the other relevant experiments. (It does not include Experiment 2, condition 3, because it had much longer mean response times, and Experiment 3 because it had almost no errors.) Once again there is about a 50-ms difference between correct and error mean response time. A similar analysis of monkey data from our laboratory exhibits error response times that are reliably slower than correct by about 90 ms (743 ms vs. 833 ms; Roitman & Shadlen, 2002). The monkey data are clear on this point largely because they are based on 500 trials per condition compared to 100 trials per condition for the human data.
In conclusion, we estimate that under these conditions, errors are slower than correct responses by an amount on the order of 75 ms. This is consistent with the differences found in Green et al. ( 1983) and is a small amount compared to the effects of discriminability and the speed-accuracy tradeoff, which have effects of up to 1000 ms or more (e.g., Experiment 2). On the other hand, it is enough to require the consideration of alternative theories, some of which are discussed below. Despite this failure, we argue that the proportional-rate diffusion model deserves credit as a good approximation for the large effects of stimulus strength on mean correct response time.
In this work, we have focused on a simple version of the diffusion model among the possible sequential sampling models. This choice is motivated by the simplicity of this model’s analytic predictions and its close relation to both signal detection theory and the “ideal observer” sequential probability ratio test (SPRT) model (Laming, 1968; Stone, 1960; Wald, 1947). In the next section, we briefly review the most relevant alternative theories of response time. Nearly all generalize the simple diffusion model described thus far.
Our analysis of the chronometric and psychometric functions is based on Link’s relative judgment theory and the closely related wave-difference theory (Link, 1975, 1978a, 1978b, 1992; Link & Heath, 1975). In these theories, the decision is based on sampling the difference between two random variables in discrete time. Only very general constraints are placed on these difference distributions (see Luce, 1986; Smith, 1990). Link derived general predictions for this class of models with minimal scaling assumptions. He also extended the model to account for response bias, Weber’s Law, and other phenomena. However, he did not pursue the specific scaling assumption emphasized here.
For proportional scaling assumptions, the predictions of relative judgment theory are quite similar to the predictions of the proportional-rate diffusion model. For response time, they predict identical chronometric functions. Thus, the chronometric function prediction is quite general. For accuracy, relative judgment theory predicts a logistic function of stimulus strength but in a parameter related to the moment generating function of the difference distribution, rather than the mean. This parameter reduces to 2μ/σ2 for a Gaussian difference distribution, which is equivalent to the diffusion model under our assumptions. In sum, Link’s analysis shows that the shape of the chronometric and psychometric functions is general to a range of sequential sampling models.
Diffusion models with parameter variability
Ratcliff and colleagues have generalized the diffusion model to account for response time and accuracy in a variety of tasks (Ratcliff, 1978, 2002; Ratcliff & Rouder, 1998, 2000; Ratcliff & Smith, 2004). Their generalizations incorporate additional variability in the drift rate and the starting point. Increasing the trial-to-trial variability in the drift rate results in error response times that are slower than correct response times. Increasing the trial-to-trial variability in the starting point results in error response times that are faster than correct responses.
Adding a modest amount of parameter variability can provide an account of the slow errors observed here. For example, we fit each observer in Experiment 1 with a version of the proportional-rate diffusion model that had variability in the drift rate parameter. We assumed that this variability was independent of stimulus strength. This added one parameter σμ for the standard deviation of the drift rate. Averaged over observers, the estimated parameters were k = 25 ± 5, A′ = 0.74 ± 0.06, tR = 354 ± 16 ms, and σμ = 0.7 ± 0.3. The fit clearly improved for two of the six observers, and over all observers the improvement was significant (likelihood ratio test, χ 2(6) = 22, p < .01). In this generalized model, the threshold ratio is not fixed to 3.5 and was found to average 3.2 ± 0.2 over observers. Thus, the features that are critical to the proportional-rate diffusion model are preserved in a diffusion model with modest parameter variability. In addition, parameter variability provides one possible account of the reduced threshold ratio observed for contrast tasks in Experiment 5. In sum, adding parameter variability provides a more complete description of the data, albeit at the price of greater complexity.
A different modification of the diffusion model is to change the nature of integration over time. The diffusion model assumes perfect memory in its integration process. This constraint can be relaxed by including a “leak” by which the accumulated evidence decays back to a neutral state (Busemeyer & Townsend, 1993; Smith, 1995, 2000). Arguments for and against a leak can be found in Usher and McClelland ( 2001) and Ratcliff and Smith ( 2004), respectively. In Experiment 2, we found some evidence of a loss of sensitivity for longer duration responses, consistent with a minor leak. More pointed efforts to detect such a leak can be found in Huk and Shadlen ( 2004).
Nonstationary diffusion models
Another generalization of the diffusion model is to allow the drift rate or bound to vary as a function of time within a trial. For example, Smith ( 1995; see also Burbeck & Luce, 1982) has suggested that transient and sustained stimuli may result in sensory information with different time courses. Sustained stimuli may result in drift rates that rise to a constant value, whereas transient stimuli may have a drift rate that rises and falls as a brief pulse. An alternative generalization is to allow the bound to vary with time. Ditterich, Mazurek, Roitman, Palmer, and Shadlen ( 2001) considered nonstationary bounds to predict the details of response time distributions. In addition, these modifications have similar effects as incorporating parameter variability in that they can also account for error response time being slower than correct response time.
Yet another alternative to the diffusion model is the family of race or accumulator models where evidence for each alternative is integrated separately rather than integrated as a single value of relative evidence. The simplest of these models assumes the accumulators are independent (e.g., Smith & Vickers, 1988; Reddi, Asrress, & Carpenter, 2003). Alternatively, one can allow some degree of correlated input into separate accumulators (Mazurek, Roitman, Ditterich, & Shadlen, 2003). Such a correlated-input race model has the diffusion model as a special case with perfectly negatively correlated inputs. Thus, a correlated-input race model can generalize the diffusion model as do the other alternatives considered here.
Independent sampling models
An extreme version of a leaky diffusion model has no memory and instead samples stimulus information independently over time. Such independent sampling is often known as probability summation over time (Watson, 1979). For a detection task, Maloney and Wandell ( 1984) derived an independent sampling model with a high threshold and stability assumption that predicts the Weibull psychometric function and similarly shaped chronometric function (Wandell, Ahumada, & Welsh, 1984). While simple versions of such models predict an exponential response time distribution, Maloney and Wandell allowed nonstationary inputs, which result in a wider range of distributions. It remains to be seen if such an independent sampling model can be generalized to discrimination and used to account for the data presented here.
Physiologically motivated theories
There are now a number of studies that seek the neural basis of simple perceptual decisions (e.g., Hanes & Schall, 1996; Platt & Glimcher, 1999; Ratcliff, Cherian, & Segraves, 2003; Roitman & Shadlen, 2002; Romo & Salinas, 2003). The motion task used here has previously been used in a series of studies exploring both motion perception and eye movement responses (e.g., Newsome et al., 1989). The results observed with this task have been described in a neurally based model developed by Mazurek et al. ( 2003).
We focus on two places in the chain of neural events: the middle temporal area (MT) and the lateral interparietal area (LIP). Several pieces of neurophysiological evidence suggest that neurons in area MT carry the sensory signals relevant to the motion task. Over 90% of the neurons in area MT are direction-selective, responding more strongly to a preferred direction than to the opposite null directions (Albright, 1984; Dubner & Zeki, 1971). Analyses of lesion, sensitivity, and microstimulation experiments have linked the activity of MT neurons with ps |