Volume 4, Number 12, Article 12, Pages 1136-1169 doi:10.1167/4.12.12 http://journalofvision.org/4/12/12/ ISSN 1534-7362
Crowding is unlike ordinary masking: Distinguishing feature integration from detection
Denis G. Pelli
Psychology & Neural Science, New York University, New York, NY, USA
[home] [e-mail]
Melanie Palomares
Psychology & Neural Science, New York University, New York, NY, USA
[home] [e-mail]
Najib J. Majaj
Center for Neural Science, New York University, New York, NY, USA
[home] [e-mail]
Abstract

A letter in the peripheral visual field is much harder to identify in the presence of nearby letters. This is “crowding.” Both crowding and ordinary masking are special cases of “masking,” which, in general, refers to any effect of a “mask” pattern on the discriminability of a signal. Here we characterize crowding, and propose a diagnostic test to distinguish it from ordinary masking. In ordinary masking, the signal disappears. In crowding, it remains visible, but is ambiguous, jumbled with its neighbors. Masks are usually effective only if they overlap the signal, but the crowding effect extends over a large region. The width of that region is proportional to signal eccentricity from the fovea and independent of signal size, mask size, mask contrast, signal and mask font, and number of masks. At 4 deg eccentricity, the threshold contrast for identification of a 0.32 deg signal letter is elevated (up to six-fold) by mask letters anywhere in a 2.3 deg region, 7 times wider than the signal. In ordinary masking, threshold contrast rises as a power function of mask contrast, with a shallow log-log slope of 0.5 to 1, whereas, in crowding, threshold is a sigmoidal function of mask contrast, with a steep log-log slope of 2 at close spacing. Most remarkably, although the threshold elevation decreases exponentially with spacing, the threshold and saturation contrasts of crowding are independent of spacing. Finally, ordinary masking is similar for detection and identification, but crowding occurs only for identification, not detection. More precisely, crowding occurs only in tasks that cannot be done based on a single detection by coarsely coded feature detectors. These results (and observers’ introspections) suggest that ordinary masking blocks feature detection, so the signal disappears, while crowding (like “illusory conjunction”) is excessive feature integration — detected features are integrated over an inappropriately large area because there are no smaller integration fields — so the integrated signal is ambiguous, jumbled with the mask. In illusory conjunction, observers see an object that is not there made up of features that are. A survey of the illusory conjunction literature finds that most of the illusory conjunction results are consistent with the spatial crowding described here, which depends on spatial proximity, independent of time pressure. The rest seem to arise through a distinct phenomenon that one might call “temporal crowding,” which depends on time pressure (“overloading attention”), independent of spatial proximity.




History
Received October 17, 2001; published December 30, 2004
Citation
Pelli, D. G., Palomares, M., & Majaj, N. J. (2004). Crowding is unlike ordinary masking: Distinguishing feature integration from detection. Journal of Vision, 4(12):12, 1136-1169, http://journalofvision.org/4/12/12/, doi:10.1167/4.12.12.
Keywords
crowding, masking, peripheral vision, feature integration, illusory conjunction, critical spacing, letter identification, object recognition, isolation field, integration field, second-order mechanisms
for related articles by these authors

for papers that cite this paper


1. Introduction
Object identification involves the moderately well understood process of feature detection, followed by a mysterious “integration” process that combines the detected features to produce a classification decision. The purpose of this paper is to characterize “crowding.” Crowding is excessive integration, which spoils identification and reveals the inner workings. With this characterization in hand, one can address some longstanding questions about object identification, such as whether faces are recognized by parts and the roles of letter and word recognition in reading (Martelli, Pelli, & Majaj, in press; Su, Berger, Majaj, & Pelli, 2004).
Crowding and ordinary masking are special cases of masking. In general, “masking” refers to the impairment of the discriminability of a signal by another pattern. Ordinary masking, such as masking by gratings (Legge & Foley, 1980; Swift & Smith, 1983; Levi, Klein, & Hariharan, 2002) or noise (Stromeyer & Julesz, 1972; Pelli & Farrell, 1999), is usually only effective when the mask overlaps the signal. However, in the normal periphery or the amblyopic fovea, neighboring letters with no overlap severely impair the identification of a signal letter (Korte, 1923; Ehlers, 1936, 1953; Bouma, 1970; Anstis, 1974; Flom, 1991). This particular masking phenomenon is called “crowding” (Stuart & Burian, 1962; for historical review, see Strasburger, Harvey, & Rentschler, 1991, or Strasburger, 2002). Crowding is not specific to letters. We will argue that ordinary masking occurs when signal and mask stimulate the same feature detector and that crowding occurs when signal and mask stimulate different feature detectors that both reach the same feature integrator (where features are combined to recognize an object).
Despite progress in vision research, we still can only barely begin to answer a simple question like, “How do I recognize the letter A?” The literature on grating detection, with the ideas of spatial frequency channels (feature detectors) and probability summation, offers a good answer to the easier question, “How do I tell whether the screen is blank?” The answer goes under many names, including “channels” and “probability summation.” We follow Graham (1980) in calling it “feature detection.” The observer has many independent units, “feature detectors,” each with a receptive field (linear weighting over space and time, summed to yield one number) followed by a nonlinear process that results in a sharply increasing probability of response with contrast.1, 2 The image that matches a detector’s receptive field is called its “feature.” All feature detectors operate independently and the observer detects a displayed image if and only if any of the detectors do (Brindley, 1960; Quick, 1974; Graham, 1980, 1989; Robson & Graham, 1981).
In the various relevant papers, the word “feature” sometimes refers, as above, to the elementary component of the visual analysis (e.g., Graham, 1980) and sometimes refers to the labeled value (e.g., “red” or “A” or “triangle”) of a stimulus dimension that the experimenter chose to vary (e.g., Treisman & Schmidt, 1982, p. 139). We will be referring to elementary features, except in the Section 4.7 discussion of illusory conjunctions and Feature Integration Theory.
Elementary-feature detection provides a good account of detecting simple targets (i.e., for which detection of a single feature suffices for a correct response). However, identifying (or detecting a second-order signal) usually requires combining the information from several feature detections to respond correctly (see Chubb, Olzak, & Derrington, 2001). This (nonlinear) assembly process is called “feature integration” (or “binding”). Feature integration may internally represent the combined features as an object, but we will not address that here. We will suggest that crowding is excessive feature integration, integrating over an inappropriately large area that includes the flanking mask as well as the signal.
This Introduction presents a simple intuition (Section 1.1) that brings together ideas about feature detection (1.2) with facts of ordinary masking (1.3) and crowding (1.5). Later, in Discussion, we will review the close connection between crowding and illusory conjunction (4.7).
1.1 Overview
This paper characterizes crowding, distinguishing it from ordinary masking. We believe that the term “crowding” should encompass not just the original task of identifying a letter among letters in the periphery (or amblyopic fovea), but also any other task with similar results: critical spacing proportional to eccentricity and independent of size. A diagnostic test is proposed in Discussion (Section 4.1).
Past attempts to characterize and explain crowding have each varied a few parameters in similar tasks. In this experimental and theoretical synthesis we have tried to be more comprehensive. As we attempt to put it all together into one story, there are many points of agreement between our proposed explanation and earlier suggestions, but there are also some important differences. What is new here arrived late in the process, forced upon us by the data, after a long period of stumbling in the dark.
Perhaps the most important new fact emerging from this union of old and new results is the effect of which task the observer is assigned. In ordinary masking the signal disappears, so the observer cannot say anything about it, and fails all tasks (Thomas, 1985b). Many investigators have assumed that this would be true of crowding as well (e.g., see Cavanagh, 2001). But, in fact, conditions of crowding that severely impair identification of a letter (reported here) or orientation of a grating (Wilkinson, Wilson, & Ellemberg, 1997) have little or no effect on the detectability of the target. Observers report seeing a jumbled target that incorporates features from the mask. We struggled with this detection/identification dichotomy for a long time, and failed in our attempts to crowd gratings, until we eventually realized that the dichotomy is more subtle than just detection versus identification. All the tasks susceptible to crowding are tasks that, with some plausible assumptions, require more than one feature-detection event (a “conjunction” of several feature detections). Tasks that require only a single feature-detection event are immune, or nearly so. This parallels the dichotomy found in searching for one feature versus a conjunction of features — a feature pops out and a conjunction does not3 — and is strong evidence that crowding interferes with feature integration, not feature detection. The multiple detections must be integrated, and that integration is susceptible to crowding; the single detection doesn’t need to be integrated, so there’s no crowding.
Previous authors, aware that ordinary masking is selective, have shown that crowding too is selective (e.g., Kooi, Toet, Tripathy, & Levi, 1994). Here we compile old and new results showing that the selectivity of crowding is vastly broader than that of ordinary masking. Ordinary masking reveals the narrow selectivity of a feature detector (the first stage), whereas crowding reveals the broad selectivity of a feature integrator (the second stage).
It is more-or-less established that in ordinary masking the same feature detector mediates the effects of mask and signal (Legge & Foley, 1980; Foley & Chen, 1999; Wilson & Kim, 1998).1 A new finding, the effect of mask contrast as a function of spacing (Section 3.6), provides strong evidence that, in crowding, distinct feature detectors mediate the effects of mask and signal.
We survey the literature on illusory conjunctions at the end of Discussion (Section 4.7), but the only prerequisite for reading that section is the vocabulary established here in the Introduction. Most of the illusory conjunction papers’ results are consistent with crowding, as defined here, but a few papers, including Treisman and Schmidt (1982), describe a different phenomenon that we will call “temporal crowding.”
1.2 Feature detection and integration
The familiar notion that the observer detects features (components of the image) independently and then integrates them to perceive an object goes back to Weber’s (1834, 1846) and Sherrington’s (1906) suggestions, based on their psychophysical evidence, that neural receptive fields mediate the sense of touch. Indeed, simply supposing that independent detection of features is a necessary first stage of vision (i.e., cannot be bypassed) implies that any observer response (e.g., object recognition) that communicates information about a combination of features must be based on an integration (combination) of several detected features (e.g., Selfridge, 1959; Neisser, 1967; Campbell & Robson, 1968; Thomas, Padilla, & Rourke, 1969; Rosch & Lloyd, 1978; Treisman & Gelade, 1980; Sagi & Julesz, 1984; Olzak & Thomas, 1986). Despite its appealing simplicity, feature detection has been hard to establish convincingly. The grating detection literature is convincing (e.g., Campbell & Robson, 1968; Robson & Graham, 1981; Graham, 1989), but that leaves open the possibility that other tasks and targets (e.g., identifying letters) might bypass feature detection. Judging whether or not a screen is blank, as one does in detection experiments, might not be representative of what the visual system can do. Some capabilities might appear only for important highly practiced tasks, like reading faces or text. Part of this concern is allayed by the finding that thresholds for identifying letters, across the entire range of size, font, and alphabet, is accounted for by a slight extension of the standard “probability summation” model of independent feature detection (Pelli, Burns, Farell, & Moore, in press). Finding, as predicted by feature detection, that efficiency for identification is inversely proportional to complexity (number of features), even when highly practiced, is strong evidence that observers cannot bypass the feature-detection bottleneck (Pelli, Farell, & Moore, 2003).
We all want to know how features are integrated, but findings to date provide only hints as to the nature of this computation. Perception of coherent motion of two-grating plaids is based on a nonlinear combination of the two grating components (Adelson & Movshon, 1982) and some MT neurons actually implement this combination rule (Movshon, Adelson, Gizzi, & Newsome, 1986). Speed discrimination is affected by whether the components are perceived to form an object (Verghese & Stone, 1995, 1996). Applying the classic summation paradigm to motion discrimination and texture segregation reveals the exponent of the nonlinear combination of multiple components (Morrone, Burr, & Vaina, 1995; Graham & Sutter, 1998). Accounts of texture discrimination suppose linear combination of nonlinearly transformed feature detection signals (for review, see Chubb et al., 2001; Landy & Graham, 2004). Visual search and crowding experiments have also contributed hints, as we will see below. Accounts of the feature integration that underlies identification of objects are more speculative. Much of the debate has distinguished the recognition-by-components approach championed by Biederman (1987) from the alignment approach championed by Ullman and Poggio (see Tarr & Bulthoff, 1998). Alas, putting together the hints from all these studies fails to provide clear guidance as to how to address the larger question of what kind of computation underlies object recognition.
1.3 Ordinary masking
Masking provides an important part of the evidence for feature detection. Masking goes beyond the narrow domain of the question, “Is the screen blank?” to examine the effect of an irrelevant background mask on visibility of the signal. In ordinary masking, it is generally supposed that the mask affects the visibility of the signal only to the extent that the mask stimulates the receptive fields of the feature detectors that pick up the signal. We will argue that crowding cannot be explained as ordinary masking (i.e., mediated by mask stimulation of the feature detector(s) that detect the signal).
Ordinary masking is most effective when the mask has more or less the same spatial frequency, orientation, and location as the signal (Legge & Foley, 1980; Phillips & Wilson, 1984; Levi, Klein, et al., 2002). Critical-band masking experiments have shown that the spatial frequency tuning of grating detection (Greis & Rohler, 1970; Stromeyer & Julesz, 1972; Solomon & Pelli, 1994) and letter identification (Solomon & Pelli, 1994; Majaj, Pelli, Kurshan, & Palomares, 2002; Chung, Levi, & Legge, 2001) is 1.6 octaves wide. And it is independent of eccentricity, having the same tuning in central and peripheral vision (Mullen & Losada, 1999).
Ordinary masking has very similar effects on detection and identification (Thomas, 1985a, 1985b). As we shall see, our results show that crowding affects only identification, not detection. (We would expect crowding to affect detection of second-order signals, but no one has tried it yet.) With no mask, threshold contrasts for identifying a signal are usually higher than for detecting it, but, for a wide range of signal size (Pelli et al., in press) and viewing eccentricities (Raghavan, 1995; Thomas, 1987), identification and detection thresholds are in a constant ratio (also see Graham, 1985). In critical band masking studies, channel frequencies for detection and discrimination (of letters and gratings) are the same (Majaj et al., 2002). Threshold contrasts for identification and detection have similar dependence on mask contrast (Raghavan, 1995; Pelli, Levi, & Chung, 2004). These characteristics of ordinary masking are evidence for the popular idea that ordinary masking impairs discriminability of the signal by directly stimulating the feature detector that mediates our judgments about the signal. The very different characteristics of crowding will require a different kind of explanation.
1.4 Scope
We restrict our scope to simultaneous mask and signal, of any duration. A flanker that is delayed or prolonged relative to the signal can produce “metacontrast” or “object substitution” masking (Breitmeyer, 1984; Enns & Di Lollo, 1997, 2000; Tata, 2002; Enns, 2004). These phenomena seem to be closely related to motion perception (Didner & Sperling, 1980; Reeves, 1982; Burr, 1984; Bischof & Di Lollo, 1995), and may be related to what we will call “temporal crowding.” They are not directly relevant to understanding (spatial) crowding, and will not be discussed here (see Huckauf & Heller, 2004).
1.5 Crowding
Our final conclusions rest on objective measurements: thresholds for detection and identification. However, the subjective crowding experience, all by itself, makes a strong case for a key point. Examine the two blocks of letters in this demo while fixating on the central cross:
letters.gif
What you see on the left is a block of four As. What you see on the right is much harder to describe. It’s a block of four letter-like objects. But they aren’t clearly As or Bs; they’re in-between and unstable. Each letter may seem at times to be an A and sometimes a B, but most of the time it has a confusing hybrid A-B appearance that would be impossible to draw. We usually assume that visual object recognition segments the scene and accounts for each segment by hypothesizing an “object” with appropriate properties. One supposes that all the object’s properties are estimated from the same image segment. Surprisingly, this demo shows that a single object’s several properties are estimates from various regions, large and small. Each letter is an object. The perceived presence and locations of the letters distinguish four objects, arranged in a square. To resolve four items these properties must each be assessed over a more-or-less one-letter region. Yet each item’s shape has a hybrid A-B appearance, incorporating information from a region that includes several letters. (Using your finger to cover other letters in the demo above, you will find that to see one letter clearly you must cover the rest of the letters in the block.) This seriously undermines the notion of object recognition as a unitary process that takes in a region of the image and emits an “object” with properties. Instead, our demo shows that, in this case, the distinct properties of location (where) and shape (what) are estimates from very differently sized regions. Perhaps, despite its unitary appearance, an “object” is just a loose bundle of independently estimated properties. [This differs from the Wolfe & Bennett (1997) suggestion that loose bundling results from inattention. Our demo of loose bundling occurs with full attention.] This demo, like the rest of this paper, reveals a dichotomy between properties (e.g., presence or location) that may be estimated from a single detected feature and those (e.g., letter identity or shape) that require integration of several features.
    It is as if there is a pressure on both sides of the word that tends to compress it. Then the stronger, i.e. the more salient or dominant letters, are preserved and they ‘squash’ the weaker, i.e. the less salient letters, between them. (Korte [1923], translated by Uta Wolfe)
    It looks like one big mess. I keep seeing [the letter] ‘A’ even though there is no ‘A’ in the Sloan alphabet. I seem to take features of one letter and mix them up with those of another. (Observer JG)
    When it’s difficult, I see a unit that is a combination of letters and I can’t say how many there are. (Observer MLL)
    I know that there are three letters. But for some reason, I can’t identify the middle one, which looks like it’s being stretched and distorted by the outer flankers. (Observer MCP)
These are observers’ descriptions of how they see a letter that is flanked by other letters in the periphery. This was first described by Korte (1923), and was dubbed crowding by Stuart & Burian (1962). They and others showed that acuity is greatly impaired by crowding (Ehlers, 1936, 1953; Woodworth, 1938; Flom, Weymouth, & Kahneman, 1963; Bouma, 1970), which backs up the introspective descriptions by objective measurement of impaired form recognition.
For identifying a letter among letters, the spatial extent of crowding is roughly half the eccentricity (Bouma, 1970; Toet & Levi, 1992). For identifying a numeric character among numeric characters, Strasburger et al. (1991) reported a similar proportionality constant, 0.4, independent of character size (0.05 – 1.4 deg). Latham and Whitaker (1996) report similar results for a 3-bar acuity target among four such distractors of random orientation. Tripathy and Cavanagh (2002) report similar results for identifying the orientation of a T among “squared thetas.” Wilkinson et al. (1997), as well, report a proportionality constant of 0.4 for fine discrimination of the contrast or spatial frequency of a grating among gratings. Levi, Hariharan, and Klein (2002, p. 175) report a (center-to-center) proportionality constant of 0.5 for masking of an E by a bar, both made up of grating patches.
This scaling with eccentricity, independent of size, is utterly unlike ordinary masking, where critical spacing scales with signal size, independent of eccentricity. As we’ll see, the most dramatic difference — for us the defining difference (Section 4.1) — between crowding and ordinary masking is the complementary effects of signal size and eccentricity.
Many lateral masking studies have varied size and eccentricity, but, unfortunately, typically not in a way that would distinguish crowding from ordinary masking. Under the rationale that acuity scaling would provide a more level playing field for comparing different eccentricities, most studies that varied signal size or eccentricity, varied both together, roughly in proportion (e.g., Andriessen & Bouma 1976; Loomis, 1978; Jacobs, 1979; Santee & Egeth, 1982b; Chung et al., 2001). Alas, proportional increase of the stimulus size and spacing with eccentricity would not be expected to affect either crowding or ordinary masking and thus does not distinguish the two kinds of effect. Chung et al. (2001) studied some of the properties of crowding of letters by letters to compare crowding with ordinary “pattern” masking of gratings by gratings. Filtering target and mask letters to one-octave bands, they identified the most effective mask frequency as a function of target frequency, and found that this agreed with the earlier literature on ordinary masking. At a large, near-critical spacing they found a shallow log-log slope (0.13 – 0.3) for the effect of mask contrast on threshold contrast for identifying the target, which they noted is much shallower than the slopes of 0.5 to 1 generally found in ordinary masking. Using ordinary, unfiltered letters we further investigate the contrast response function here (Figures 911, below).
Levi, Klein, et al. (2002) and Levi, Hariharan, et al. (2002) used a tumbling E and a flanking bar that were both made up of grating patches. They separately varied eccentricity, grating frequency, and patch extent. In the fovea, the critical spacing was proportional to signal extent, consistent with ordinary masking. In the periphery, the critical spacing was proportional to eccentricity, consistent with crowding.
Another important difference between crowding and ordinary masking is that ordinary masking blocks both detection and identification — the signal disappears — whereas crowding affects only identification — the signal remains visible, but is jumbled with the mask. This dichotomy has not been spelled out in the earlier literature, although Wilkinson et al. (1997) noted a much weaker effect of crowding on detection than on identification: Their signals were still detectable when they could no longer be identified.
Because the range of crowding is roughly half the eccentricity, it extends only a few minutes of arc for foveal targets (Flom, Weymouth, et al., 1963; Bouma, 1970; Loomis, 1978; Jacobs, 1979; Levi, Klein, & Aitsebaomo, 1985; Toet & Levi, 1992; Wilkinson et al., 1997; Leat, Li, & Epp, 1999; Hess, Dakin, & Kapoor, 2000; Chung et al., 2001). Liu and Arditi (2000) found that letter-string length is underestimated when observers are asked to judge the number of acuity-sized letters in the fovea. Their descriptions of this foveal effect are similar to those by Korte (1923) and our observers of crowding in the periphery, but with the greatly reduced range, less than 5 arcmin, that one would expect from its proportionality to eccentricity. Thus crowding treats fovea and periphery alike, following one eccentricity rule throughout.
Lateral masking studies with larger signals find no effect of nonoverlapping flankers on foveal targets (Strasburger et al., 1991; Leat et al., 1999). Bondarko and Danilova (1996; 1997) showed that nonoverlapping bars slightly decrease acuity for a Landolt landlotc.gif signal in the fovea. In foveal tasks that do show effects of laterally displaced masks, the spatial extent of the lateral interference scales with the size of the signal: maximum effect at a spacing of 5 times the gap width of a Landolt C (Flom, 1991) and 3 times the wavelength of a grating (Polat & Sagi, 1993; Levi, 2000). Levi, Klein, et al. (2002) found that the critical spacing of a tumbling E and flanking bars (all made up of grating patches) is proportional to signal extent over a 50:1 range, independent of spatial frequency. This scaling with signal size is characteristic of ordinary masking and unlike crowding.
Because our experiments are done mostly with letters, we postpone until Discussion the rest of our review of crowding with other stimuli (Section 4.2). What we have reviewed so far tells us to look at the effects of spacing, eccentricity, contrast, and task. With those results in hand, we will be ready to tackle illusory conjunctions (Section 4.7).
1.6 Our study
We begin by replicating previous results on the spatial extent of crowding as a function of viewing eccentricity. We then explore the effects of varying signal and mask (size, contrast, complexity, and type: letter and grating) and task (identification and detection). (See Table 1.) The effects of spacing, eccentricity, size, contrast, and task distinguish crowding from ordinary masking. The other manipulations help characterize the selectivity of crowding. The selectivity of ordinary masking is that of the feature detector. Our results indicate that the selectivity of crowding is that of the feature integrator.

Figure
Effect of
Task
Signal and flanker (font or grating)
Signal size (deg)
Flanker size (deg)
Ecc. (deg)
Observer
eccentricity
Identify
Sloan
1
1
0, 2, 4, 8, 12, 20, 24
MCP, SJR, SSA
fovea vs. periphery
Identify
Sloan
0.32
0.32
0, 4
MCP, AG, MLL
size
Identify
Sloan
0.32, 0.5, 1, 2
same as signal
4
MCP, AG, SSA
flanker size
Identify
Sloan
0.32
0.32, 0.64, 0.96, 1.6, 3.2
4
MCP, AG
font
Identify
2x3 Checkers, Sloan, Bookman, Outline Sloan
0.25, 0.32, 0.50, 0.32
same as signal
4
MCP, MLL
# of flankers
Identify
Sloan
0.32
0.32
6
MS, MLM, MCP
9, 11
flanker contrast
Identify
Sloan
0.32
0.32
4
MCP, AG, MLL
task
identify, detect
Sloan
0.32
0.32
4
MCP, AG, MLL
eccentricity
Detect
Sloan
0.75
0.75
2, 4, 8
MLM
size
Detect
Sloan
0.75, 1.5, 3.0
same as signal
8
MCP, MLL
extent
Identify
1 c/deg grating
2, 4, 8
same as signal
20
MCP, AG
letter vs. grating
identify, detect
Sloan, 8 c/deg grating
0.32,0.52
0.32,0.52
4
MCP, AG, MLL
Table 1. The experiments. For gratings, “size” is the 1/e radius of the Gaussian envelope, and the observer “identified” the ±45° orientation. Regarding Figure 8, observer MCP was tested at 4 instead of 6 deg eccentricity.
The experiments were exploratory, trying to characterize the phenomenon, especially as a window into the mysterious feature integration process. The results indicate that the observer’s identification response is based on an amalgam of all the features detected in a large region we call the “integration field,” which is approximately centered on the signal (Toet & Levi, 1992). Most relevant to this conclusion are the effect of task and the combined effects of mask contrast and spacing.
2. Methods
2.1 Observers
Seven observers with normal or corrected-to-normal acuity performed these experiments binocularly (see Table 1). One observer (MCP) is an author. The other observers were paid for participating.
2.2 Tasks and stimuli
All experiments were performed on Apple Power Macintosh computers using MATLAB software with the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997). The background luminance was set to the middle of the monitor range, about 18 cd/m2. Sloan letters were based on Louise Sloan’s design specified by the NAS-NRC (1980). (The Sloan font is available from http://psych.nyu.edu/pelli/software.html). Sloan letters were usually 0.32 deg high and wide. Sinewave gratings were 1 or 8 c/deg with a circularly symmetric Gaussian envelope with a 1/e radius that we specify as “size.”
Observers viewed a gamma-corrected grayscale monitor (Pelli & Zhang, 1991). The fixation point was a 0.15 deg black square. The position of the fixation point on the screen determined the eccentricity of the signal (always presented at the center of the display). For peripheral viewing conditions, the fixation point was displayed for the entire trial. For foveal viewing, the fixation point was presented for 200 ms, followed by a 200 ms blank and then the signal. The signal, flanked by two horizontally aligned high-contrast masks of either letters or gratings, appeared at the center of the screen for 200 ms (Figure 1). Signal eccentricity was controlled by varying the position of the fixation point on the screen. Thus the signal was presented at various eccentricities along the horizontal meridian in the right visual field. Letter contrast is defined as the ratio of luminance increment to background. Letter contrast can be greater than 1. Flanker contrast was usually 0.85. Each signal presentation was accompanied by a beep. Mask-to-signal spacing is measured center to center. Usually the signal and each flanking letter were independent random samples from the same alphabet. A response screen followed, showing all the possible signals (usually the 10 letters sloanalphabet.gif of the Sloan alphabet) at 80% contrast. Observers identified the signal by using a mouse-controlled cursor to point and click on their answer. Correct identification was rewarded with a beep.
fig01.gif
Figure 1. Typical condition for crowding. The black square is a 0.15 deg fixation mark. The signal is a faint 0.32 deg Sloan letter at 4 deg in the right visual field. Two 85%-contrast masks (S, Z) flank a signal letter (R) with a signal-to-mask center-to-center spacing of 0.64 deg. Letter contrast is defined as the ratio of luminance increment to background. Letter contrast can be greater than 1. The signal contrast changes from trial to trial.
The signal duration (200 ms) is too brief for eye movements in response to the signal to help see it. We occasionally watched the observer’s eyes while the observer was doing the task to detect anticipatory eye movements, but we never saw any. The results presented in this paper (e.g. Figure 3a) reveal a more-than-tenfold threshold elevation and a steep dependence on spacing. Anticipatory eye movements would reduce the signal eccentricity by an amount that would vary between trials and among observers. The steep dependence of threshold on spacing (e.g., Figure 3a) and the consistent critical spacing among observers (e.g., Figure 3b) indicate that anticipatory eye movements were not a problem.
Threshold contrast was measured by a modified QUEST staircase procedure (Watson & Pelli, 1983; King-Smith, Grigsby, Vingrys, Benes, & Supowit, 1994) using an 82% criterion and β of 3.5 for 40-trial runs. Log thresholds were averaged over two runs for each condition.
In the detection task, the signal letter was randomly presented in one of two consecutive intervals. The flankers were displayed in both intervals, independently randomly selected for each interval. Observers indicated their choice of interval by clicking the mouse once for first and twice for second. Correct responses were rewarded with a beep.
2.3 Clipped line fit
Strasburger et al. (1991) suggested that threshold contrast for target identification is a good way to measure the effect of crowding, and we agree. Most of our data are threshold contrast plotted against spacing, and have a generally sigmoidal shape. We fit a clipped line to the data by eye. This fit has three parts: a horizontal ceiling, a falling slope, and a horizontal floor (Figure 2). Threshold elevation (a ratio) is measured from floor to ceiling. Critical spacing is the least spacing at which there is no threshold elevation in the fit (i.e., edge of the floor).
fig02.gif
Figure 2. Clipped line fit: threshold contrast as a function of center-to-center spacing of signal and flanker. (At zero spacing the signal and flanker are superimposed, added on top of one another.) We fit a clipped line to each data set by eye. Two parameters of that fit are of interest. Threshold elevation is the ratio of thresholds at zero and infinite flanker spacing (i.e., ceiling:floor ratio). Critical spacing is the least spacing at which there is no threshold elevation (i.e., edge of the floor).
fig03.gif
Figure 3. Effect of eccentricity. Each symbol is the geometric average of two threshold estimates. (a). Threshold as a function of flanker spacing for a 1 deg Sloan letter between flankers at various eccentricities. The solid lines are clipped-line fits, as explained in Figure 2. The horizontal line at the bottom left, below the graph, represents the width of the signal in deg. Contrast is the ratio of increment to background, and can exceed 1. (b). Critical spacing plotted against viewing eccentricity for three observers. Like Bouma (1970), we find that critical spacing is roughly half of viewing eccentricity. Observers MCP, SJR, and SSA.
3. Results
Figures 316 present our results. To help the reader make sense of it all, Table 2 presents the nine empirical differences between crowding and ordinary masking. We recommend focusing on the sheer strangeness of crowding. Our intuitions, based on familiarity with ordinary masking, were defied at every turn. The single most important result is Bouma’s (1970), greatly extended here, that critical spacing is roughly half the eccentricity (distance from fixation), independent of everything else.

Fact  
Ordinary masking
Crowding
Figures
a
Similar in fovea
and periphery.
Normally evident only in the periphery (Korte, 1923; Stuart & Burian, 1962; Flom, Weymouth, et al., 1963; Bouma, 1970).
3, 4
b
Signal disappears,
suppressed by mask.
Signal is visible but ambiguous, incorporating features from mask (Korte, 1923; Flom, Weymouth, et al., 1963; Andriessen & Bouma, 1976; Wolford & Shum, 1980; Wilkinson et al., 1997; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001).
110
c
Occurs for any
task and signal.
So far, specific to identification of letters (Flom, 1991), orientation of tumbling E (Levi, Hariharan, et al., 2002), and fine discrimination of contrast, spatial frequency, and orientation (Andriessen & Bouma, 1976; Wilkinson et al., 1997; Parkes et al., 2001).
12 - 16
d
Similar effect on
identification and
detection.
Little or no effect on detection (Wilkinson et al., 1997) and coarse discrimination.
12 - 14, 16
e
Narrow critical spacing,
little or no effect of nonoverlapping
mask.
Wide critical spacing can be more than 10 times bigger than a small signal (Korte, 1923; Stuart & Burian, 1962; Bouma, 1970; Toet & Levi, 1992; Levi, Hariharan, et al., 2002).
35
f
Critical spacing
scales with signal, independent of
eccentricity.
Critical spacing is roughly half of viewing eccentricity (Bouma, 1970; Toet & Levi, 1992), independent of signal size (Strasburger et al., 1991; Levi, Hariharan, et al., 2002), mask size, mask contrast, and number of masks.
3 – 11, 13 - 15
g
Spatiotemporal
selectivity more
or less consistent
with a receptive
field.
Remarkably unselective, showing equal effect over a wide range of flanker type (letter, black disk, or square; Eriksen & Hoffman, 1973; Loomis, 1978), flanker size (10:1), and flanker number (≥2).
6, 8
h
Shallow power-law
contrast response
(log-log slope of
0.5 to 1).
Steep sigmoidal contrast response. Log-log slope of 2 for close spacing. Log ceiling and log slope fall exponentially with spacing.
i
Threshold mask
contrast depends
on spacing. No
saturation.
Threshold and saturation mask contrasts are independent of spacing.
Table 2. Facts: summary of the differences between crowding and ordinary masking. We cite the authors of the known facts about crowding, many replicated here, and italicize our new findings. We take line f as the defining difference: critical spacing scales with eccentricity, not size. (a). The extremely-short-range foveal effect described by Liu and Arditi (2000) is likely to be crowding. (c). Andriessen and Bouma (1976) show a large crowding-like effect of flanking bars on fine discrimination of bar orientation, and a small effect on detection threshold, too small to account for the effect on orientation discrimination. Illusory conjunction provides evidence for crowding of conjunction of color vs. shape (Treisman & Schmidt, 1982). (d). The critical spacing for detecting a letter among letters can be as large as that for identification, but we call it ordinary masking, not crowding, because it scales with letter size (Figure 14b), not eccentricity (Figure 13b). Despite refutals of the feature vs. conjunction dichotomy in the search literature, we still expect a robust feature vs. conjunction dichotomy in crowding.3 (e). Figures 12, 16a, and 16b are examples of weak effects of nonoverlapping masks in ordinary masking. (f). At 0 deg eccentricity, Levi, Klein, et al. (2002) found that critical spacing is proportional to signal size over a 50:1 range. “Roughly half” can be as low as 0.3, as in Figure 5b. Fine (2003) also reported crowding to be independent of contrast. (g). We say “more or less consistent” because current feature detector models to explain ordinary masking have not just one but several similar receptive fields (to implement divisive inhibition) as noted earlier.1 The spatiotemporal selectivity found by Chung et al. (2001) with filtered letters is like that for ordinary masking, unlike our summary for crowding, but it is not certain whether their paradigm elicited crowding or ordinary masking (see Section 4.6). Many studies have documented systematic effects of the similarity of target and flanker (e.g., Estes, 1982; Ivry & Prinzmetal, 1991; Nazir, 1992; Kooi et al., 1994; Donk, 1999; Chung et al., 2001).
There is a minor caveat to Bouma’s rule, but it does not affect the basic intuition. The caveat is that critical spacing is asymmetric, greater in the peripheral than in the central direction from the target (Bouma 1970, 1973; Townsend, Taylor, & Brown, 1971; Banks, Larson, & Prinzmetal, 1979; Chastain & Lawson, 1979; Wolford & Shum, 1980; Toet & Levi, 1992). It is greater in radial directions (peripheral and central) than in circumferential directions (Chambers & Wolford, 1983; Toet & Levi, 1992). It is greater in the upper than in the lower visual field (Intriligator & Cavanagh, 2001). These details matter when comparing results across conditions, but they reinforce the basic intuition that the extent of crowding depends almost exclusively on the local anatomy of the visual field, independent of the signal, unlike ordinary masking, which is co-extensive with the signal, independent of location in the visual field.
3.1 Effects of spacing and eccentricity
One of the stranger aspects of crowding is Bouma’s (1970) finding that the critical spacing is proportional to eccentricity, which we replicate here. We measured the threshold contrast for identifying a 1 deg letter as a function of signal-to-mask spacing. The signal was at 0 to 24 deg eccentricity in the right visual field (see Table 1). There were two flankers, one to the left and one to the right of the signal. Figure 3a shows that the letter masks have a very strong effect, raising threshold tenfold. For each eccentricity, the clipped-line fit provides an estimate of the critical spacing. Figure 3b shows that critical spacing is proportional to eccentricity. Our data confirm the finding (Bouma, 1970; Strasburger et al., 1991; Toet & Levi, 1992; Levi, Hariharan, et al., 2002, p. 173) that the critical spacing is roughly half of the viewing eccentricity. (Bouma, 1970, was right to say “roughly” 0.5. For some of our data, this value drops as low as 0.3, as we will see below.) Andriessen and Bouma (1976) report a critical spacing of 0.4 of eccentricity for fine discrimination of line orientation. Wilkinson et al. (1997) report a critical spacing of 0.4 of eccentricity for fine discrimination of crowded grating contrast and spatial frequency, and slightly higher for fine discrimination of orientation.
Figure 4 shows threshold for observer AG in the presence of one mask, as a function of horizontal mask offset, for a 0.32 deg signal. The width of the critical region is the sum of the critical spacings, left and right. Separate curves show results at 0 and 4 deg eccentricity. In the fovea, the critical region (i.e., the sum of critical spacings left and right) is about as wide (0.40 deg) as the signal (0.32 deg). In the periphery, the critical spacings are 1.00 deg to the left and 1.25 deg to the right, for a total critical region width (2.25 deg) about 7 times the 0.32 deg width of the signal. This replicates the asymmetry of previous findings that, for a given signal location, crowding extends farther in the peripheral direction than in the central direction (Bouma 1970, 1973; Townsend et al., 1971; Banks et al., 1979; Chastain & Lawson, 1979; Wolford & Shum, 1980; Toet & Levi, 1992).
fig04.gif
Figure 4. Fovea vs. periphery. Threshold for identifying a 0.32 deg Sloan letter (at 0 or 4 deg in right visual field) in the presence of a single flanker of the same size, as a function of spacing (i.e., flanker position). (Positive values are positions to the right of the signal. Negative values are to the left.) At 4 deg eccentricity, the critical region is 7 times wider than the letter (indicated by the horizontal bar). Observer AG. Not shown: similar results for observers MCP and MLL. Similar results at 6 deg eccentricity appear in Figure 8.
It may seem surprising that a more peripheral mask is more effective than a more central mask equally distant from the target. However, critical spacing is proportional to eccentricity, suggesting that the relevant cortical representation of visual space is progressively more compressed at greater eccentricity. Thus the more-eccentric mask is effectively closer than the less-eccentric mask (i.e., at a smaller fraction of the ever-increasing critical spacing).
Section 3.5 will show that a single flanker is much less effective than flankers on both sides (Bouma, 1970).
3.2 Effect of size
Ordinary masking would lead one to expect that the critical spacing in crowding would be proportional to signal size, not eccentricity. What is the effect of size on critical spacing? Levi, Klein, et al. (2002) and Levi, Hariharan, et al. (2002) found that, at 0 deg eccentricity, the critical spacing is proportional to size over a 50:1 range, but that, in the periphery, critical spacing is proportional to eccentricity, independent of size. We measured threshold contrast for letters of various sizes at 4 deg viewing eccentricity. Figure 5a shows threshold contrast as a function of spacing for letter sizes of 0.32, 0.5, 1, and 2 deg. For these sizes, threshold is elevated 26-fold (geometric mean). Figure 5b shows that the critical spacing did not change with letter size, instead remaining constant at about 1.2 deg, which replicates the Strasburger et al. (1991) finding, for numerals, that the spatial extent of crowding is 1.2 deg at 4 deg eccentricity, independent of size. Threshold elevation increases as a function of size (Figure 5c) because, as Figure 5a shows, the ceiling remains fixed at about 0.7 while the floor drops with size. This is just the familiar fact that contrast sensitivity for letters depends on size (see Pelli & Farell, 1999).
fig05.gif
Figure 5. Effect of size. Identification of 0.32 - 2 deg Sloan letter (at 4 deg in the right visual field) between 2 flankers of the same size. (a). Threshold as a function of spacing. For observer AG, the threshold contrasts for all four sizes, 0.32, 0.5, 1, and 2 deg, nearly superimpose at spacings up to 1 deg. (b). Critical spacing vs. size, for three observers, showing no effect. Average (horizontal line) is 1.2 deg. (c). Threshold elevation increases somewhat with size: log-log slope of 0.6. This replicates the familiar finding that threshold contrast depends on size (Pelli & Farell, 1999).
3.3 Effect of flanker size
We also measured the effect of mask size on critical spacing. We kept signal size at 0.32 deg and varied mask size from 0.32 to 3.2 deg. We didn’t know what to expect. On the one hand, increasing the mask’s size increases its contrast energy, which we thought might increase the mask’s effect. (For a letter, contrast energy is the product of area and squared contrast.) On the other hand, enlarging the mask makes it less similar to the signal, which might lessen its effect. Surprisingly, Figure 6 shows that the threshold curves nearly superimpose, hardly affected by mask size, retaining a critical spacing of about 1.3 deg. (Figure 6a is for one observer; Figure 6b is for another.) Unlike ordinary masking, the crowding effect is not tuned to size. The range (spatial extent) of crowding is independent of signal size (Figure 5b) and mask size (Figure 6), depending solely on eccentricity (Figure 3b).
fig06.gif
Figure 6. Effect of flanker size. Signal is 0.32 deg Sloan letter at 4 deg in the right visual field. The two flankers were 1 to 10 times the size of the signal. One fit was made to data for all mask sizes. (a). Critical spacing is 1.3 deg for MCP. (b). Critical spacing is 1.2 deg for AG. Horizontal line at the bottom left of the graph represents the width of the signal.
3.4 Effect of font
We wondered whether perimetric complexity (perimeter, squared, over ink area; Pelli et al., in press) or some other aspect of letter shape is important for crowding. Figure 7a shows threshold as a function of spacing for several fonts, including a meaningless alphabet of twenty-six 2x3 checkers (e.g., d) and another alphabet consisting of just two letters, sloann.gif and sloanz.gif, from the Sloan alphabet. These curves are quite similar to each other, differing from one another by large, but unimportant, vertical translations and small horizontal translations. The vertical shifts track the different threshold contrasts for different fonts, which is not of interest here. The small horizontal shifts are small differences in critical spacing, which ranged from 1 to 1.3. Pelli et al. (in press) showed that efficiency for letter identification is inversely proportional to perimetric complexity of the font, but complexity seems to be irrelevant to crowding. Figures 7b and 7c plot critical spacing and threshold elevation as a function of complexity, showing no systematic effect of complexity.
fig07.gif
Figure 7. Effect of font. Signal is 0.32 deg letter at 4 deg in the right visual field. (a). Threshold contrast as a function of spacing for various fonts, Bookman bookmanb.gif, 2 x 3 Checkers checkers.gif, Sloan sloans.gif, NZ Sloan sloann.gif, and Outline Sloan sloanoutline.gif. (b). Critical spacing is independent of complexity. (c). Threshold elevation as a function of perimetric complexity, perimeter squared divided by area, showing that threshold elevation does not seem to be systematically related to complexity. Observer MLL. Not shown: similar results for observer MCP.
3.5 Effect of number of flankers
Would adding more flankers increase the crowding effect? Figure 8a plots threshold for letter identification in the periphery with 1, 2, and 4 flankers. The signal and flankers are all Sloan letters, right-side up. Figure 8b shows that critical spacing is independent of number of flankers. It is about 0.4 of the eccentricity. Figure 8c shows that threshold elevation increased when flankers were increased from 1 to 2, but threshold was not further elevated when flankers were increased from 2 to 4. Consistent with this, Wilkinson et al. (1997) reported that reducing the number of flanking gratings from 14 down to 2 did not significantly reduce their effect on the discriminability of the signal. Toet and Levi (1992) report extensive measurements of the effect of two T flankers on judging orientation of a T target, adding that, in pilot measurements, they found no effect of a single flanker. However, Strasburger et al. (1991) did report an increased threshold elevation when they increased the number of flankers from 2 to 4.
fig08.gif
Figure 8. Effect of number of flankers. Signal is 0.32 deg Sloan letter at 6 deg in the right visual field. Flankers are letters, too, also right-side up, but displaced vertically or horizontally. (a). Threshold contrast as a function of spacing, for 1, 2, or 4 flankers. Flanker position (e.g., “right”) is relative to signal position. The horizontal line at the bottom left of the graph represents the width of the signal. Note that, lacking data at zero spacing (because the flankers would have collided), it is not clear whether there is a ceiling at small spacing, so that part of the clipped-line fit is somewhat arbitrary. The one-flanker data shown here, for a signal at 6 deg ecc., are similar to the data shown in Figure 4, for a signal at 4 deg ecc. We replicate the well-established finding that the critical spacing is greater in the peripheral than in the central direction. (b). Critical spacing (estimated separately for each condition, but averaging results for 1-left and 1-right) as a function of number of flankers. (c). Threshold elevation (estimated separately for each condition) as a function of number of flankers. This last graph is tentative because it depends on the somewhat arbitrary ceilings of the clipped-line fits in panel a. Observers MS and MLM. Not shown: similar results for observer MCP at 4 deg eccentricity.
It makes sense that a single flanker would be much less effective than multiple flankers that surround the object. One imagines that when there is only one flanker the observer may use a large but offset integration field to pick off the exposed target. This strategy is not available when there are two or more flankers surrounding the target.
For a signal 6 deg to the right of fixation, we find a smaller critical spacing for flankers above and below, skrvertical.gif, instead of left and right of the signal, skrhorizontal.gif, which is consistent with Toet and Levi’s (1992) finding that the critical spacing is smaller along the circumference than along a radial ray from the fovea.
3.6 Effect of flanker contrast
The experiments presented above used flankers of a high contrast, 0.85. Figure 9a shows threshold signal contrast as a function of spacing for several mask contrasts. Figure 9b shows that critical spacing is independent of mask contrast. There is an outlier, the X representing a critical spacing of 0.5 deg at a mask contrast of 0.1 for observer MLL. This is based on the fit shown in panel a to the 0.1 mask contrast data (solid diamonds). Note that threshold is elevated only when the mask overlaps the signal.
fig09.gif
Figure 9. Effect of flanker contrast for three observers identifying a 0.32 deg Sloan letter at 4 deg in the right visual field. (a). Threshold contrast as a function of spacing for several flanker contrasts. (b). Critical spacing as a function of flanker contrast. Mask contrasts below 0.1 did not elevate threshold so they have no critical spacing. Observers MCP, AG, and MLL.
Thus the anomalous point in 9b seems to represent ordinary masking, not crowding. The rest of the data show no consistent effect of mask contrast on critical spacing: For one observer, critical spacing rises slightly with mask contrast, but it falls slightly for the other two observers. Fine (2003), too, reported crowding to be independent of contrast. So far, we have seen that critical spacing is independent of signal size, mask size, mask contrast, signal and mask font, and number of masks.
Figure 10 demonstrates the effect of mask contrast, showing an abrupt transition as mask contrast is increased. Once the mask becomes visible it soon saturates, producing its full effect on the signal.
fig10.gif
Figure 10. Effect of flanker contrast. Starting at the top, in each row, fixate the black square, and try to identify the middle letter on the right. As you read down the chart, the contrast of the center letter is always 0.50, while the contrast of the two outer letters increases (0, 0.10, 0.15, 0.25, and 0.50). You’ll find that the central letter becomes much harder to identify as soon as the flankers are at all visible.
Based solely on this demo, one might wonder whether the crowding is determined by similarity. The flankers become more similar to the signal as their contrast approaches that of the signal. However, Chung et al. (2001) manipulated signal and flanker contrast to test this hypothesis, finding that, at least in their conditions, more mask contrast always increased masking, even when this made the masks less similar to the signal.
In another view of the same data, Figure 11a shows threshold contrast as a function of mask contrast for several spacings. For a 0.32 deg letter, the contrast response curves show that threshold elevation increases abruptly with mask contrast, going from none to full effect as the mask goes from 0.1, the threshold contrast for identifying an isolated letter, to about 3 times that, saturating at higher contrast. There are two critical mask contrasts. In our clipped-line fit, mask threshold is the mask contrast at which threshold contrast of the signal begins to increase (edge of floor). And mask saturation is the mask contrast at which threshold contrast of the signal stops increasing (edge of ceiling).
fig11.gif
Figure 11. Effect of flanker contrast for three observers identifying a 0.32 deg Sloan letter at 4 deg in the right visual field. Same data as Figure 9. (a). Threshold contrast for identifying the target letter as a function of mask contrast for observer MLL. Clipped lines (shown) are fit to the (roughly sigmoidal) data, constrained to have equal threshold and saturation contrasts of the mask for all conditions. Threshold contrast rises at 0.1 mask contrast and saturates at 0.25 mask contrast for all spacings. Clipped lines (not shown) were also fit independently to the data for each condition for each observer, and the parameters of these fits are plotted in panel d. (b). Psychometric function. Proportion correct identification of a letter as a function of contrast. The knees (critical contrasts) of this psychometric function roughly match those of the contrast response function in panel a. This is a maximum likelihood fit of a Weibull function to the measured proportion correct (not shown) at several contrasts (see Pelli et al., in press; Strasburger, 2001). The lower asymptote is 1/10 because that is the chance of correctly guessing the identity of one of 10 letters. (c). The threshold elevation (left scale of panel c) and log-log slope (right scale) of the fits in panel a (and similar data for observers MCP and AG) are high at small spacings and fall exponentially with increased spacing. (d). Threshold and saturation contrasts of the mask as a function of spacing. Mask threshold is the first knee, where the signal threshold rises. Mask saturation is the second knee, where the signal threshold saturates. Each pair of points (solid and open) is based on an independent clipped-line fit (not shown) to the data for one condition and observer. The threshold contrast for identifying the mask may be estimated from that for the signal (0.1) at low (0.01) mask contrast (panel a). Observers MCP, AG, and MLL.
This contrast-response curve is quite unlike what is usually seen in ordinary masking. Here the function rises steeply and hits a hard ceiling, with no further increase over a wide range of high mask contrasts (0.25 – 1). In ordinary masking, the function rises with a log-log slope of 0.5 to 1 and continues to increase relentlessly. The log-log slope of the (clipped line) contrast-response function for crowding is 2 at the closest spacing and falls exponentially with spacing (Figure 11c, right hand scale). The function found here is more reminiscent of the sigmoidal form of a frequency-of-seeing curve, rising suddenly from floor to ceiling over a narrow range of contrast. For comparison, Figure 11b shows the observer’s proportion of correct identifications for an unflanked signal at this eccentricity as a function of signal contrast.
Chung et al. (2001) measured the contrast-response function for a bandpass-filtered letter among similar letters at a single separation (2.2 deg) at an eccentricity of 5 deg, obtaining shallow log-log slopes (0.3 and 0.1) that are consistent with the less than 0.4 slope found here at our maximum separation (1.5 deg at an eccentricity of 4 deg). Testing at such large (near-critical) separations (about 0.4 of eccentricity), the threshold elevation and slope are nearly gone.
The series of functions plotted in Figure 11a reveal something quite remarkable. It is hardly surprising that the threshold elevation (on the vertical scale) is reduced at greater spacings, as shown in Figure 11c. But we were surprised to find that the critical mask contrasts (0.1 and 0.25 on the horizontal scale) are unaffected by the spacing. In Figure 11a every curve (one for each spacing) turns up at a mask contrast of 0.1 and saturates when the mask contrast reaches 0.25, no matter how far away the signal is. Figure 11d shows explicitly that the critical mask contrasts are independent of spacing. We will come back to this in Discussion.
3.7 Effect of task: identification and detection of letters and gratings
Most crowding studies have used identification tasks, whereas most masking studies have used detection tasks. To determine whether crowding depends on task, Figure 12 shows identification and detection thresholds for a letter among letters as a function of flanker spacing for two observers. (Figure 16a shows similar results for a third observer.) For identification, averaging across the three observers, the threshold elevation is large (ten-fold) and extends out to 1.3 deg (four signal widths). For detection, the threshold elevation is only three-fold but extends about as far (average is 1.5 deg).
fig12.gif
Figure 12. Effect of task. Identification or detection of a letter between letter flankers. Signal is 0.32 deg Sloan letter at 4 deg in the right visual field. Threshold curves for identifying a letter are sigmoidal with an average threshold elevation of about 1 log unit and a critical spacing of 1.2 deg. (a). Observer MCP. (b). Observer MLL. There are some observer differences, but detection threshold is always lower, with less threshold elevation. The average critical spacing for detection is 1.5 deg. Horizontal line at the bottom left of the graph represents the width of the signal. Figure 16a shows similar results for a third observer.
To distinguish crowding from masking, we assessed the effect of eccentricity and size on critical spacing.
We measured the effect of eccentricity (2, 4, and 8 deg in right visual field) on detection thresholds for 0.75 deg Sloan letters. Figure 13a plots threshold as a function of spacing for each eccentricity. Figure 13b shows that the critical spacing for detection is independent of eccentricity, unlike the proportionality found for identification (Figure 3b).
fig13.gif
Figure 13. Effect of eccentricity on detection. Detection of a letter among letters at several eccentricities in the right visual field. Signal and flankers are 0.75 deg Sloan. (a). Threshold as a function of spacing. (b). Critical spacing as a function of eccentricity. The critical spacing for letter detection is independent of eccentricity. This is characteristic of ordinary masking, whereas in crowding the critical spacing is proportional to eccentricity, as in Figure 3b. Observer MLM.
We measured detection thresholds for Sloan letters of three sizes, 0.75, 1.5, and 3 deg, at 8 deg in the right visual field. The results in Figure 14b show that the critical spacing for letter detection is proportional to size; critical spacing for letter identification is independent of size (Figure 5b).
fig14.gif
Figure 14. Effect of size on detection. Detection of a letter among letters at several sizes. Signal and flankers have equal size. Signal is Sloan letter at 8 deg in the right visual field. (a). Threshold as a function of spacing. (b). Critical spacing as a function of size. The critical spacing for letter detection is proportional to letter size. This is characteristic of ordinary masking, whereas in crowding the critical spacing is independent of size, as in Figure 5b. Observer MCP. Not shown: similar results for observer MLL.
To us, this is the most telling difference: In ordinary masking (e.g. letter detection), the critical spacing is proportional to signal size (Figure 14b), independent of eccentricity (13b), whereas in crowding (e.g., letter identification) the critical spacing is proportional to eccentricity (3b), independent of size (5b).
We also changed the envelope size of 1 c/deg gratings in a ±45° orientation discrimination task at 20 deg viewing eccentricity (Figure 15a). We saw earlier (Figure 5b) that changing the size of letters did not affect critical spacing. However, for gratings, the critical spacing scales with the size of the envelope (Figure 15b). There is no mystery here: The gratings mask each other only when they overlap; at their critical spacing they are abutting.
fig15.gif
Figure 15. Effect of grating extent. Size is the 1/e radius of the Gaussian envelope. Grating (at 20 deg in the right visual field) flanked by two gratings. (a). Threshold contrast for identification of ±45° orientation of 1 c/deg grating between two flanking gratings as a function of spacing, for several envelope sizes. Signal and flanking gratings had same spatial frequency and same size envelope. (b). Critical spacing as a function of envelope size. Observer MCP. Not shown: similar results for observer AG.
3.8 Effect of letter vs. grating
Here we tried letters (0.32 deg Sloan) and gratings (8 c/deg) in every combination of target and flanker. Majaj et al. (2002) show that identification of letters is mediated by a channel with a center frequency determined by the stroke frequency of the letter. For a 0.32 deg Sloan letter the stroke frequency is 1.6/0.32 = 5 c/deg, and, by their formula, the channel frequency is 6.3 c/deg, which is very close to the 8 c/deg spatial frequency of the grating we used. Thus the identification of letter and grating in this experiment was mediated by channels tuned to similar spatial frequencies.
We measured thresholds for detection and identification of 8 c/deg sinewave gratings. Signal and flanker gratings were each randomly tilted ±45° on each trial. In the detection task the observer was required to choose which of two intervals contained the signal grating (ignoring orientation). In the identification task there was only one interval and the observer was asked, on the response screen, to identify its +45° or -45° orientation.
Figures 16c and 16d show that neither grating nor letter flankers raised the grating signal’s threshold unless they overlapped it. (Letter size is 0.32 deg; grating size is 0.52 deg; see Table 1.) Grating threshold elevation at all spacings is similar for both tasks (detection and identification) and flanker types (letter and grating). Compared with identifying a letter among letters, the grating curves show no ceiling and have a small critical spacing (about one signal width). The grating’s narrow critical spacing — threshold is elevated only when the flanker overlaps the grating — suggests ordinary masking, not crowding.
fig16.gif
Figure 16. Effect of letter vs. grating. Identification or detection of a letter or grating flanked by letters or gratings. Signal at 4 deg in the right visual field. (a). Letter flanked by letters. Averaging across Figures 12ab and 16a, the critical spacing is about 5 letter widths (1.5 deg) for both identification and detection. (b). Letter flanked by gratings. Critical spacing is 2 times the width of the signal for identification, and 5 times the width of the signal for detection. (c). Grating flanked by letters. (d). Grating flanked by gratings. The results show that threshold is elevated only when the flankers overlap the signal. Sloan letters were 0.32 deg wide and sinewave gratings were 8 c/deg with a 0.52 deg Gaussian window (radius at 1/e). Horizontal line at lower left corner represents the width of the signal. There were always two flankers, to the right and left of the signal. Observer AG. Not shown: similar results for observers MCP and MLL.
When we originally got the grating results reported in Figures 16c and 16d, we were led to think, wrongly as it turns out, that gratings are immune to crowding. Our identification task was too coarse. We had asked the observer to distinguish orientations 90° apart. Ordinary masking studies have shown that we see gratings by means of feature detectors that have an orientation bandwidth of ±15° to ±30° (Phillips & Wilson, 1984). Thus orthogonal gratings are detected by distinct feature detectors, and we would expect the label of a single feature detector to suffice for identifying the coarse orientation of the grating. Indeed, Thomas and Gille (1979) reported that two gratings differing in orientation by 20° to 30° are identified just as accurately as they are detected. And the thresholds for detection and identification in Figures 16c and 16d seem to be identical. This is the logic that Watson and Robson (1981) applied to frequency identification. When the two signals stimulated different detectors, observers could identify at the threshold for detection. When the same feature detector picks up both signals, then the observer cannot identify based on a single feature detection and requires at least two detectors to be active. The ratio of the detector responses would presumably be a good basis for fine discrimination. (Treisman, 1991, makes the same point for other stimulus dimensions.) Thus a parametric change in the task, from a coarse (>2:1) to a fine (<2:1) frequency discrimination, results in a qualitative change in the observer’s computational algorithm, from single- to multi-feature detection and integration (also see Verghese & Nakayama, 1994, and Discussion, Section 4.3).
4. Discussion
Our seven theoretical conclusions about the difference between crowding and ordinary masking are listed in Table 3 and discussed in Sections 4.14.7. The discussion of illusory conjunctions comes last (Section 4.7), but its only prerequisite is the vocabulary established in the Introduction (Section 1). We begin the discussion by proposing a definition.

Theory  
Ordinary masking
Crowding
Facts (Table 2)
Section
a
Critical spacing is proportional to size and independent of eccentricity.
Critical spacing is proportional to eccentricity (Bouma, 1970) and independent of size (Strasburger et al., 1991; Levi, Hariharan, et al., 2002).
f
b
Occurs for any task.
Specific to tasks that could not be performed based on a single detection by coarsely coded feature detectors.
b - d
c
Same feature detector mediates the effects of mask and signal.
Distinct feature detectors mediate the effects of mask and signal.
g - i
d
Eccentricity doesn’t matter.
In the periphery, the observer uses an inappropriately large integration field because smaller fields are absent.
a - i
e
Impairs feature detection.
Impairs feature integration (Flom, Weymouth, et al., 1963; Wolford & Shum, 1980; He et al., 1996; Parkes et al., 2001; Chung et al., 2001; Levi, Hariharan, et al., 2002).
a - i
f
Selectivity is that of the feature detector.
Selectivity is that of the feature integrator.
g
g
No signal feature is detected, so the signal is invisible.
Features of both signal and mask are detected and combined, so the signal is visible, but jumbled with the mask (Korte, 1923; Wolford & Shum, 1980; Parkes et al., 2001; Levi, Hariharan, et al., 2002).
b - d
Table 3. Theory: summary of the differences between crowding and ordinary masking. We cite the authors of existing theories about crowding, and italicize our new ideas. (b). Treisman (1991) makes a similar suggestion for illusory conjunctions. (c). This idea is implicit in the models that Wolford and Shum (1980), Treisman and Schmidt (1982), Wilkinson et al. (1997), and Parkes et al. (2001) use to explain their results. (f). Current feature detector models have several receptive fields, to implement divisive inhibition, but the differences in selectivity of these various fields are too small to matter here. (g). Treisman and Schmidt (1982) make a similar suggestion for illusory conjunction.
Using published and new results, we have established that the original crowding phenomenon — impaired identification of a letter among letters in the periphery — is unlike ordinary masking. We suggest that the term “crowding” be applied to any phenomenon that exhibits the critical-spacing dependence reported by Bouma (1970).
When defining a term already in use, the desire to sharpen must be tempered by the need to respect established usage. Crowding was discovered in the course of measuring letter acuity in patients with central field loss (Korte, 1923) or amblyopia (Ehlers, 1936).4 Stuart and Burian (1962) coined the term “crowding” for the impairment of identification of a peripheral letter by neighboring letters. Since then the term has been used primarily, but not exclusively, to refer to lateral masking of letters by letters. Most writings on crowding — and this manuscript is no exception