1 Introduction
In FMRI, a functional map is an important representation of how cognitive function is related to neuroanatomy. Such maps provide a topographic representation of the brain regions that are (and are not) systematically responsive to differing values of a cognitive variable. The size, shape, number and location of the “blobs” (i.e., voxelclusters meeting some statistical relevance criterion) on the functional maps are the basis for inferences about the neural substrates of the cognitive process. Given the importance of functional maps, there is a continuing need to scrutinize the sensitivity, precision and technical assumptions of the mapping procedure itself. The topic of the current technical note is the mapping procedure used to generate the widely used information maps (Kriegeskorte et al., 2006).
The motivation for information mapping is the statistical concern that a region’s responses to the cognitive variable under study might take a complex multivariate form. For example, a group of multiple voxels might conjointly respond in a taskrelevant manner even though individual voxels may not detectably do so (Haxby et al., 2001; Cox and Savoy, 2003; Haynes, 2006; Norman et al., 2006; Mur et al., 2008; Tong, 2010; Wagner and Rissman, 2010; Formisano and Kriegeskorte, 2012; Serences and Saproo, 2012). Such distributed response patterns might be effectively undetectable with conventional univariate statistical tests restricted to individual voxel responses, but detectable with an explicit multivariate test for multivoxel response patterns, i.e., using some Multivoxel Pattern Analysis (MVPA) technique. To address this concern in the context of functional mapping, Kriegeskorte et al. (2006) proposed a simple procedure to enable sophisticated MVPA methods to be readily applied to detect and map brain regions that contain information about the experimental conditions, irrespective of whether the informative responses are univariate or multivariate.
In the proposed procedure, the unit of evaluation is not the single voxel but a “searchlight” – the group of voxels contained in a spherical neighborhood of radius around a single voxel. The searchlight statistic is a measure of whether the conjoint responses of this group of voxels contain information about the experimental conditions being tested. Based on these abstractions, an information map is generated as follows: the searchlight statistic is evaluated for searchlights centered at every voxel in the brain; and the statistic’s value for each searchlight is mapped to the central voxel of that searchlight. The resulting topographic representation generated by the searchlightprocedure has been referred to as the information map. Such searchlightbased information maps are now routinely reported in studies that employ MVPA methods (for example, Haynes et al., 2007; Soon et al., 2008; Johnson et al., 2009; Poldrack et al., 2009; Chadwick et al., 2010; Oosterhof et al., 2010; Nestor et al., 2011; Alink et al., 2011; Golomb and Kanwisher, 2011; Peelen and Kastner, 2011; Stokes et al., 2011; Woolgar et al., 2011; Morgan et al., 2011; Oosterhof et al., 2012; Connolly et al., 2012; Kaplan and Meyer, 2012).
Notwithstanding their popularity, interpreting a searchlightbased information map presents a variety of challenges (Kriegeskorte et al., 2006; Poldrack et al., 2009; Pereira and Botvinick, 2011; Jimura and Poldrack, 2012). One such challenge is posed by the topographic ambiguity of the information map. Recall that a searchlight statistic computed on the responses of an entire multivoxel searchlight is mapped to a single voxel on the information map, namely, that searchlight’s central voxel. This mapping protocol is applied to searchlights across the brain irrespective of the number or the spatial locations of the informationcarrying voxels within each searchlight. Consequently, the spatial position of an informative voxel on the information maps is a coarse index to the actual location of the informative “pattern” within that voxel’s searchlight. Furthermore, since a searchlight has a unique central voxel , an informative voxel on the information map is not indicative of the actual number of voxels constituting the informative pattern within that voxel’s searchlight neighborhood. Given these properties of the information map, we asked: what, if anything, can be reliably inferred about the size and shape of a multivoxel pattern from its corresponding signature on the information map?
Previous studies have treated this question as a qualitative concern requiring cautious interpretation. Nonetheless, here we show that information maps are in fact subject to several crisply quantitative geometric constraints that strongly govern how such maps can be interpreted.
Our analytical results are based on a simple geometric intuition. Since a multivoxel searchlight is defined at every voxel across the brain, searchlights centered at different voxels systematically overlap each other, i.e., have voxels in common. Using overlapping searchlights is crucial to obtain a continuous topographic coverage especially when the locations and spatial extents of voxelneighborhoods that are taskresponsive are unknown a priori. We observed that due to these overlaps, multiple searchlights would be deemed informative merely by virtue of sharing the same taskrelevant multivoxel response patterns. Thus we reasoned that the size and shape of a multivoxel group ’s signature on the information map should be defined by exactly those voxels which have searchlightneighborhoods that contain . Using this observation and simple geometric reasoning, we formally deduce some key properties of the relationship between an informative pattern and its corresponding signature on the information map.
Based on our formal analysis, we prove here that, for any searchlight radius, a single taskresponsive voxel produces a larger signature on the information map as compared to a distributed multivoxel response pattern. Furthermore, the number of informative searchlights over the brain can increase as a function of searchlight radius, without necessarily revealing any new information and even in the complete absence of any
multivariate response patterns. Importantly, these properties are largely independent of the type of machinelearning algorithm or the testing protocol used to compute the searchlight statistic.
2 Model
2.1 Definition: The searchlight decomposition
The basis of the searchlight analysis is the geometric structure of the voxelspace in which the brain images are defined. The voxelspace is defined here as the set of all voxels augmented with a geometric structure defining the relative spatial position of the voxels in , and a distance measure between these voxels. For analytical convenience, we treat the voxelspace as being uniform and connected as described below.
A dimensional voxelspace is deemed to be uniform if every voxel has a neighboring voxel in all principal directions. Additionally, we assume that the voxelspace is connected. Specifically, there is a path connecting every pair of voxels and in with a path defined here to be an ordered sequence of voxels where voxel is a neighbor of along one of the principal directions. These simplifying assumptions are intended to emphasize the general geometric principles entailed by the searchlight method while deliberately ignoring the special cases associated with (i) the boundaries of where a searchlight may be truncated; and (ii) distinctions between graymatter and whitematter voxels and any masking of the latter from the searchlights. Although we refer to searchlights as being volumes in a voxelspace having dimensionality , the properties derived here are agnostic to the specific value of and apply to surfaces () where the searchlights are discs (as in, Oosterhof et al., 2011; Chen et al., 2011).
The key abstraction defined by the searchlight method is a decomposition of into subsets of voxels based on a geometric criterion. Given a voxel space , we define a searchlight voxeldecomposition using the following indexing function
(1) 
where is the powerset of , namely, the set of all subsets of . This indexing function takes two inputs – the identity of a voxel in the voxelspace , and a realvalue specifying the searchlight’s radius. The searchlight indexing function uses these parameters in conjunction with the geometric structure of to extract and output a set of voxels . A voxel is a member of if and only if the distance between and is less than or equal to . For convenience, we henceforth write to denote the searchlight . The resulting searchlight voxeldecomposition of for a given radius is defined as
(2) 
For clarity, we restrict our usage of the term “searchlight” to the cases when the value of the radius of a searchlight is such that each is a multivoxel entity that is not identical with , that is, , for any . We refer to the univariate case where as the univoxel decomposition.
A schematic of the searchlight indexing scheme is shown in Figure 1.
2.2 Definition: Informativeness function
The searchlight statistic is a measure of whether the voxels in the searchlight, as a unit, exhibit differences in their conjoint responses to the experimental conditions. More generally, it is a measure of whether the searchlight contains information about the experimental condition, i.e., whether the searchlight is informative. As with the radius of the searchlight, the specific statistical procedure used to compute the searchlight statistic is a discretionary choice made by the researcher (for example, see Pereira and Botvinick, 2011).
To describe the searchlight statistic in a procedureindependent manner, we use a binary indicator function, which we refer to as the informativeness function, . Given a subset of voxels , the function returns a value of if the responses of are deemed to be informative; or if they are not, based on some appropriately specified statistical criterion.
Evaluating the informativeness function on the responses corresponding to each searchlight in defines the overall information set for a particular radius
(3) 
The information map is the object obtained when the information set defined above is augmented with the geometric structure of the voxelspace by mapping the informativeness value of each searchlight to its corresponding central voxel .
The performance measure of interest here is the total number of informative searchlights for a particular searchlight decomposition
(4) 
2.3 Linking assumptions
Two simple properties link the structure of the searchlight decomposition to the structure of the information set .
The first property is that, by virtue of the regularity of their shape and relative positioning, searchlights in can overlap. Consequently, the same voxels in can be included or sampled by multiple searchlights. The second property is that since a sphere is an arbitrarily chosen and regular shape, it is unlikely that every voxel is necessarily taskrelevant in every informative searchlight. Consequently, the informativeness of the responses in some particular searchlight volume can be alternatively and accurately interpreted as indicating that some group of voxels in that searchlight volume exhibits taskdependent responses. These two properties can be combined as follows. Let be a group of taskrelevant voxels in a searchlight , where . Since searchlights share voxels, if some other searchlight also contains , that is , then it implies that should also contain taskrelevant information as it includes the taskrelevant voxels .
Based on this observation, we make two linking assumptions about the behavior of the procedures used to compute the searchlight statistic and hence the informativeness function . The first is that we restrict the focus of our analysis to the common multivariate procedure that does not include geometric information about the relative spatial positions of the voxels in a searchlight while computing that searchlight’s informativeness. The second is the Superset informativeness (SIN) assumption which postulates that:
Superset informativeness assumption: If a group of voxels is informative then every searchlight that contains is also informative.
That is, according to the SIN assumption, if then for all where and . Unless otherwise stated, we will overload the symbol to denote an informativeness function that explicitly satisfies these two model requirements.
Although the SIN assumption is based is on a sound deduction, the empirical requirement that it poses may not necessarily be satisfied in practice. Specifically, even if it is known that , the statistical procedure used to evaluate informativeness might fail to detect that a searchlight is informative even if
. Such a Type II error (i.e., failing to reject a false null hypothesis that
) might occur for any of a variety of reasons, for example, the use of an inappropriate machinelearning algorithm (Pereira and Botvinick, 2011), insufficient power due to a limited number of samples, and so on. In this regard, the SIN assumption treats the multivoxel pattern analysis techniques as being more sensitive and reliable than might actually be the case in practice. That is, the SIN assumption allows us to establish the information map’s properties in the bestcase independent of the performance idiosyncrasies of the specific multivariate method being used.3 Analytical results
Our focus of the current section is to establish how the structure of the sampling bias arises from the searchlight decomposition. We first prove that due to the geometric regularities of a searchlight decomposition, singlevoxels and multivoxel groups are sampled with different frequencies, i.e., included in a different number of searchlights. Specifically, single voxels are included in more searchlights than multivoxel groups. This sampling difference is independent of the searchlight radius. We then extend these results to prove that the frequency with which voxelgroups are sampled increases with the radius of the searchlights, irrespective of the number of voxels in the group. Finally, we prove that the information map mirrors these sampling biases in an optimistic manner, i.e., in a manner that is not necessarily warranted by the data.
3.1 Singlevoxels and multivoxelgroups are sampled with different frequencies
The regularity in the shape of the searchlights and their relative positions the voxelspace define a systematic relationship between each voxel and the searchlights in that contain that voxel . Firstly, if a voxel is a member of the searchlight , then by symmetry, the voxel is a member of the searchlight (Lemma 1). Secondly, two distinct voxels and are not simultaneously included in every searchlight that contains either of these voxels (Lemma 2).
Lemma 1.
If a voxel is a member of then the voxel is a member of , where and .
Proof.
Consider a searchlight centered at voxel . Since a searchlight is defined at every voxel in (Equation 2), it follows that there is a searchlight defined at every voxel in . By definition, a voxel is a member of if and only if the distance between and is less than or equal to the radius . Since there is a searchlight centered at , and the distance between and is less than or equal to , it follows that is a member of searchlight . Therefore, if is a member of then is a member of . ∎
Lemma 2.
For any two nonidentical voxels and , where and , there necessarily exists a searchlight that contains but not and a different searchlight that contains but not .
Proof.
This claim can be proved in two steps based on the distance between and .
First consider the case where the distance between and is greater than , that is, the diameter of a searchlight. By definition, a searchlight contains voxels that have a distance less than or equal to from that searchlight’s central voxel. Due to the spherical shape of the searchlight, the maximum distance between any two voxels in a searchlight is equal to . If the distance between and is greater than , there does not exist any searchlight of radius that contains both and as members. Thus, it follows that there exists some searchlight that contains but not ; and some other searchlight that contains but not .
Now consider the second case where the distance between and is less than or equal to . Since the distance between these two voxels is less than the maximum distance between some two voxels in a searchlight, in a uniform voxelspace there necessarily exists some searchlight that contains both and as members. Contrary to the proposition, let us assume that both and are contained in every searchlight that contains either or . That is, if a searchlight contains , then it necessarily contains , and vice versa. Recall that, from Lemma 1, a voxel is contained in every searchlight where . Now, based on the contradictory assumption, it implies that is also contained in every such searchlight where . By the same reasoning, should be contained in every searchlight where . If these conditions hold true, then it implies that every voxel in is also contained in ; and every voxel in is also contained in . If this the case, then the searchlights and are identical as they contain exactly the same voxels. This relationship, however, contradicts the requirement that . Thus, the assumption that and are both contained in every searchlight that contains either or cannot be true.
Therefore, there necessarily exists a searchlight that contains that does not contain , and some other searchlight that contains but not . ∎
Armed with the properties described by Lemmas 1 and 2
, we can now numerically estimate the number of searchlights that include a given individual voxel.
Theorem 3.
A voxel is contained in exactly different searchlights, where is the number of voxels contained in the searchlight .
Proof.
From Lemma 1, a voxel is contained in each searchlight , if and only if is a voxel in . Let be the number of voxels in . Therefore, is present in each of these searchlights. ∎
For simplicity, we treat as being the same for every searchlight, and write to indicate the canonical number of voxels contained in a spherical volume of radius , for a given resolution of the voxelspace.
From Theorem 3, we see that the radius, a parameter chosen by the researcher, directly specifies how often information in a particular voxel is sampled by multiple searchlights. For voxels of size 3mm 3mm 3mm, the number of voxels contained in searchlights of different radii are shown in Figure 2. As can be seen, the number of voxels in a searchlight, that is , grows rapidly with the radius of the searchlight , and consequently so do the number of searchlights that include a particular voxel.
Since searchlights are intended to identify multivoxel response patterns, we extend the singlevoxel property in Theorem 3 to quantify the membership of a group of multiple voxels placing no constraint on the relative spatial locations of the voxels in the group.
Theorem 4.
A group of voxels containing more than one voxel is contained in strictly less than searchlights.
Proof.
A voxel is contained in searchlights, from Theorem 3. Consequently, every voxel in is each contained in searchlights. From Lemma 2, for any two voxels and , there is necessarily a searchlight that contains and not , and vice versa. Therefore, of the searchlights containing , there necessarily exists at least one searchlight that contains but not . Thus, the number of searchlights that simultaneously contain both and must be less than . Since contains multiple voxels, any pair of voxels in must be simultaneously contained in less than searchlights. Therefore, all the voxels in cannot be simultaneously contained in searchlights, and must be contained in strictly less than searchlights. ∎
3.2 The sampling frequency of voxel(s) increases with searchlight radius
Although singlevoxels and multivoxel groups are included in different numbers of searchlights for any radius , we now show that the absolute number of searchlights that include either a singlevoxel or a multivoxel group increases with the radius of the searchlight.
Lemma 5.
A searchlight of radius is fully contained in more than one searchlight of radius , where and for all .
Proof.
Consider two searchlights centered at the same voxel – one that has a radius , and the other having radius . By definition, since , all the voxels in are members of , and there exists at least one voxel in that is not in .
Now, consider the searchlight , having radius . Due to the spherical shape of searchlights, the maximum distance between a voxel in and some voxel in is equal to . Therefore, all other voxels in must have distances less than or equal to .
Since the distance between these two maximally distant voxels and is equal to , the voxel must be contained in a searchlight of radius that is centered at , namely, . Since all other voxels in have a distance less than or equal to from , it follows that every voxel in is also contained in the searchlight . Thus every voxel in is contained in at least two searchlights having radius , namely, and . Therefore, a searchlight is contained in more than one searchlight of radius , where and . ∎
Using Lemma 5, we can now prove a general scaling property. Irrespective of the size of a voxel group, the frequency with which it is sampled by different searchlights increases with the radius of the searchlight  a property that we prove next.
Theorem 6.
A group of voxels is contained in more searchlights of radius than searchlights of radius , where , and , for all .
Proof.
Let and be the number of searchlights of radius and that contain .
Since for every , it follows, by transitivity, that if for some voxel , then . Therefore, the number of searchlights of radius that contain cannot be strictly less than that for , that is, .
By the transitivity of the subset relation, if and for some , then it follows that . From Lemma 5, a searchlight of radius is contained in multiple searchlights of radius where and (for all ). Since there is more than one searchlight of radius containing , for every searchlight for which holds true, it implies that .
From Theorems 3 and 4, the number of searchlights of radius that can contain is less than or equal to . Consequently, in a uniform and connected voxelspace, it follows that there exist two adjacent voxels and in such that is a subset of but is not a subset of . From Lemma 5, searchlights of radius centered at voxels within from fully contain all voxels in . Since , it implies that the distance of to is less than or equal to . Therefore, and consequently . Since and , it implies that there exists at least one voxel at which a searchlight of radius contains , but where a searchlight of radius does not contain . Consequently, must be strictly greater than , that is, the group of voxels is contained in more searchlights of radius than . ∎
Theorem 6 above establishes that the number of searchlights that include either a voxel or group of voxels increases monotonically with the radius of the searchlight. How then does this scaling of the sampling bias influence the properties of the information map?
3.3 An optimistic bias in the information map
Recall that (Equation 4) is an index of the sensitivity of the searchlight method in detecting multivoxel response patterns, and is equal to the total number of informative searchlights with a particular search decomposition. We now prove that as a direct consequence of how the sampling bias scales with the searchlight radius, the value of also increases strictly monotonically with increasing searchlight radius.
Theorem 7.
For two searchlight radii, and , where and for every , if then .
Proof.
Since for every , by the SIN assumption, it follows that if then , for any voxel . Therefore, the number of informative searchlights of radius cannot be strictly less than that for , that is, , for any value of .
From Lemma 5, a searchlight of radius is contained in multiple searchlights of radius where and (for all ). For every searchlight for which , there is more than one searchlight of radius containing . Since each informative searchlight of radius is a subset of multiple searchlights of radius , by the SIN assumption, it implies that .
Let . Since , there necessarily exist two adjacent voxels and such that and . By the same logic used to prove Theorem 6, searchlights of radius centered at voxels within from fully contain all voxels in . Consequently, by the SIN assumption, . This implies that a searchlight centered at voxel is informative if it has a radius but not if it has a radius . Therefore . ∎
What does Theorem 7 have to do with optimism? The monotonic increases in the number of informative searchlights is due to increases in the sampling bias, which in turn is due to the use of a multivoxel searchlight. Specifically, it is possible to obtain an increased “sensitivity” of the information map simply by increasing the radius of the multivoxel searchlights, with no reference to the statistical properties of the voxelresponses, i.e., whether they in fact exhibit multivariate response differences.
4 An illustration
In this section, we present simulations to provide a concrete intuition for the analytical results above, and their implications. For ease of demonstration, the voxelspace for all simulations consisted of a single axial slice having two principal directions. All the voxels in this voxelspace were populated with simulated response information from two fictitious experimental conditions and . These simulated data were subjected to the searchlightprocedure to produce information maps. The radius of the searchlights used for the searchlight decomposition was varied systematically to produce a corresponding information map for each radius value. The radius took the values: mm, mm, mm, mm and mm, corresponding to searchlights containing voxels, voxels, voxels, voxels and voxels respectively.
The simulated responsedata differed in the number and relative spatial location of the voxels that were responsive to the experimental conditions. In the first of these simulations discussed next, a single voxel contained taskrelevant information while all the remaining voxels did not.
4.1 The needleinthehaystack effect
Suppose there exists some voxel in , say , that exhibits a response difference to the experimental conditions such that the informativeness function identifies as being taskrelevant, that is, . Since , by the SIN assumption it follows that each of the searchlights that contain should also be deemed to be informative as well. Recall that, according to Theorem 3, each voxel in is contained in exactly searchlights where . It then follows that the signalcarrying voxel should be contained in searchlights, each being centered at a voxel in . Thus, a single signalcarrying voxel (a “needle”) should produce a cluster having voxels on the information map (a “haystack”).
To simulate this “needleinthehaystack” effect, the taskrelevant responses of in conditions and took the form illustrated in Figure 3
(a). The responses to both conditions were drawn randomly from a normal distribution with standard deviation
. The voxel ’s mean response to condition was ; and for condition . The responses of all other (non taskrelevant) voxels were drawn from normal distributions having where . To maximize the sensitivity of the searchlight statistic and emulate the requirements of the SIN assumption, a total of samples were drawn for each condition. The spatial position of voxel is shown in blue in Figure 3(b). The voxel was placed far from the boundaries of the slice to avoid truncations of the searchlights and to emulate a uniform voxelspace in the vicinity of .With this setup, the searchlight decomposition and testing procedure was implemented using the PyMVPA toolbox (Hanke et al., 2009)
. Each searchlight’s informativeness was determined by evaluating the decodability of its responses, i.e., testing for the existence of a model that accurately classifies a sample’s membership in each condition based on the searchlight’s responses
(Pereira and Botvinick, 2011; Pereira et al., 2009). Decodability was tested using a linear Support Vector Machine (SVM) with a softmargin regularization parameter,
. The searchlight statistic was the mean classification accuracy obtained using a LeaveOneOut (LOO) crossvalidation procedure.Figure 4(a) shows the information maps obtained (thresholded at ). In the upperpanel, going from the left to the right in order of increasing radius, we see that there is a single high accuracy cluster (redcolored voxels) centered at the signalcarrying voxel , and this cluster grows in size with increasing radius. The lowerpanel shows an expanded view of this high accuracy cluster, thresholded at . Consistent with the predictions described above, for each radius, the size and shape of these clusters on the information map correspond exactly to the size, shape and location of the searchlight centered at voxel . Furthermore, consistent with Theorem 7, the number of informative searchlights identified () increases in a monotonic manner with the radius of the searchlight, even though there is no difference in the actual information present or even any multivoxel response patterns to speak of.
Figure 4(b) shows the values on the information map from a single 1D segment running horizontally through the voxel through the diameter of the searchlights centered at . The voxel is assigned a value . Consistent with the SIN assumption, the accuracies on the information map do not exhibit a smooth degradation as a function of the distance from . Critically, this pedestallike profile is unlike the profile that would be expected if the searchlights were the equivalent of a “spatial smoothing” kernel on the information map.
What is the comparable effect on the information map when the taskrelevant signal is distributed over multiple voxels? We next consider this scenario.
4.2 The haystackintheneedle effect
Suppose there are two voxels, and in , that conjointly exhibit a response difference to the experimental conditions. However, neither voxel by itself shows a taskrelevant difference. That is, and . By the SIN assumption, every searchlight that contains both and should be informative, but searchlights that contain either or alone would not necessarily be informative. Recall that, according to Theorem 4, a group of multiple voxels (i.e., having more than one voxel) is contained in strictly less than searchlights. It then follows that the signalcarrying voxel group should produce a cluster having less than voxels on the information map, i.e., a multivoxel “haystack” should produce a “needle”like cluster, unlike the needleinthehaystack scenario in Section 4.1 above.
To simulate this “haystackintheneedle” effect, the taskrelevant responses in the two voxels and took the form shown in Figure 5(a). The responses to each condition were drawn randomly from a normal distribution having standard deviation . Each voxel’s mean response to conditions and are shown as dotted lines. The voxel had an identical mean response to both conditions and , specifically, (the horizontal dotted line); while voxel ’s mean response to condition was and to condition was (indicated by each of the dotted vertical lines). Importantly, the responses of voxel and to both conditions were correlated negatively. The response of voxel on condition , denoted as was equal to , the response of voxel to condition . Similarly, for condition , . The simulated responses of all other voxels were drawn from distributions having and , and were uncorrelated with the responses in either voxel or . As with the previous simulation above, a total of samples were drawn for each condition. With signals of this form, the conjoint responses of voxels and to conditions and are linearly separable (see Figure 5(a)). However, and cannot be distinguished from the responses in , but should be weakly discriminable from the responses in .
The relative spatial positions of and , indicated as blue squares, are shown in Figure 5(b). We considered two cases, where and were separated by voxels in one case; and by voxels in the other. When and have a separation of voxels, there is no one searchlight of radius mm that can contain both of these voxels. With a separation of voxels, there are no searchlights of radius mm, mm, or mm that can contain both and . With this setup, the searchlight decomposition and testing procedure was simulated in the same manner as in Section 4.1.
Figure 6 shows the portions of the information maps in the vicinity of voxels and (thresholded at ). In all the information maps, the abovethreshold cluster takes the size and shape of the corresponding searchlight and is centered at voxel , namely, the voxel exhibiting a weak response difference to conditions and . This “needleinthehaystack” organization is consistent with the simulations in Section 4.1, and is invariant to the number of voxels separating and .
Now, observe that the clusters in several, but not all, of the information maps contain subclusters consisting of voxels having high classification accuracies (indicated in red). These voxels on the information map correspond to the centers of searchlights that contain both and . As required by Theorem 4, for each radius, the number of highaccuracy voxels in the cluster are less than . Due to the geometric constraint defined by the separation between and , the presence of any highaccuracy voxels at all in an information map depends on the radius of the searchlights used. For example, information maps obtained with searchlights of radius mm do not contain any highaccuracy voxels for both separations (top row), while the information maps for searchlights of radius mm contain highaccuracy voxels for the voxel separation but not for the voxel separation.
Figures 6(a) and (b) show the 1D crosssection of the information map through the horizontal diameter of the clusters for the voxel and voxel separations respectively. As evident, there is a “smearing”, rather than smoothing, of the accuracies with growing radius values, as in Figure 4(b). Furthermore, when a searchlight is large enough to include both and , there is a large increase in the classification accuracy.
The above simulations confirm the basic statistical premise motivating the searchlightprocedure, namely, the ability of a multivoxel pattern analysis method to detect distributed response patterns. However, for any radius, the size of the clusters produced by multivoxel response patterns are smaller than those produced by single voxel responsedifferences. Consistent with Theorem 7, the number of informative searchlights identified increases in a monotonic manner with the radius of the searchlight.
4.3 Wholebrain inflation maps
The previous two simulations demonstrated signaldependent effects caused by the sampling bias inherent in the searchlight decomposition. However, according to Theorem 7, there should be a monotonic increase in the number of informative searchlights as a function of radius, irrespective of the actual distribution of taskrelevant voxels/voxelgroups across the brain. This monotonic scaling of the size of the “blobs” on the information map makes plausible a rather unusual scenario – an information map where every searchlight in the brain is deemed to be informative.
This scenario was motivated by results recently reported by Poldrack et al. (2009). In that study, information maps were generated using searchlights of radius mm and mm. Rather remarkably, with a radius of mm, only one region in the information map (the bilateral dorsolateral prefrontal cortex) was found to be uninformative while every other searchlight was informative. This wholebrain coverage was, however, not the case with the mm searchlights. Given the inflationary relationship between (the number of informative searchlights) and searchlight radius that established in the previous sections, curiosity asked: could an informative wholebrain arise (i.e., ) by random chance with a suitably chosen searchlight radius?
This question can be formulated as a covering problem. Consider a finite 3D voxel space corresponding to one containing the brain, approximated as a cubic volume of size , where is the number of voxels along the principal direction . Suppose there is a minimum covering set of searchlights such that every voxel in is contained in some searchlight in . Recall that a singlevoxel signal can produce a cluster having voxels on the information map, due to Theorem 3. If the central voxel of each of the searchlights in was informative, it would follow that searchlights centered at every voxel in every one of the searchlights in would also be informative. Since every voxel in is present in some searchlight in , it implies that a rather sparse distribution of informative singlevoxels specified by could produce an information map where every searchlight in would be informative (with the proviso that the SIN assumption holds true.)
The sparsity of these informative singlevoxels can be readily approximated if we use cubical volumes as a proxy for the spherical shape of the searchlight volumes. A cube of side voxels would fully contain a sphere having radius , and would be fully contained in a sphere of radius . With this simplification, the minimal number of searchlight cubes required to cover the voxel space is readily approximated as the volume of the voxel space divided by the volume of each searchlight cube, that is, .
For voxels of size mm, we approximate the size of the voxel space with the following values voxels, voxels and voxels. Figure 7 shows the minimum number of cubical volumes required to cover as a function of , where took values .
A searchlight cube of side is equivalent to a single voxel so the size of the covering set is equal to the total number of voxels in , namely, . However, increasing values of produce a rapid decrease in the size of the covering set. For voxels, a cubical volume that would fully contain a spherical searchlight of radius mm, a total of equally spaced signalcarrying voxels can produce an information map where every searchlight is informative. However, for a cubical volume with side voxels, corresponding to spherical volumes of radius mm, a mere voxels are required for such a fully informative map. Stated differently, an information map with a single taskrelevant cluster made up of every voxel in can be generated from a mere regularly spaced voxels of the voxels in , that is, voxels that enable the conditions to be distinguished whether due to the presence of true signal or by random chance. This potential for a small number of single voxels (i.e., of ) to drive the structure of the entire information map simply by the choice of the searchlight radius presents an important consideration for drawing neurobiological interpretation.
5 Discussion
Knowledge of the actual informationcarrying voxels in each informative searchlight would make the information map irrelevant. These actual informative voxels could be directly reported, hence resolving the overcounting that arises from their inclusion in multiple searchlights. One possible implementation would be to identify taskrelevant voxels in each searchlight, and then combine these identified voxels across searchlights. However, requiring the identification of the actual informative voxels in each searchlight could reduce the generality of the searchlight method. When pattern classifiers are used to compute the searchlight statistic, each voxel (or feature) in a searchlight is typically assigned a weight, and the weighted combination of the multivoxel responses is used to make a classification decision. However, the specific basis for assigning weights to individual features is highly dependent on the specific machine learning algorithm and its inductive assumptions (Mitchell, 1980; Wolpert, 1996; Guyon et al., 2002; Pereira et al., 2009). Consequently, appropriate techniques would be required to allow results to be compared across studies that use different MVPAtechniques.
Until such advances are made, the analytical framework described above provides several constraints on alternate interpretations of the information map. Our results present a strong argument against measuring the sensitivity of information mapping by a count of the number of informative searchlights. The seemingly high sensitivity of the searchlight method as judged by such a performance measure in part has a rather trivial explanation. Specifically, an explanation in the obligatory geometric properties of the searchlightmethod as discussed above rather than an explanation related to underlying neural organization, or the sophisticated machinelearning algorithms used to analyze multivoxel response patterns, or the widely discussed merits of multivariate statistical evaluations. Indeed, the upshot of the optimistic scaling of this performance measure is that it is maximal when explicitly assuming a highly sensitive and robust MVPA technique, namely one satisfying the superset informativeness (SIN) assumption.
6 Acknowledgments
This work was supported by the U.S. Army Research Office through the Institute for Collaborative Biotechnologies under Contract No. W911NF09D0001.
7 References
References
 Alink et al. (2011) Alink, A., Euler, F., Kriegeskorte, N., Singer, W., Kohler, A., 2011. Auditory motion direction encoding in auditory cortex and highlevel visual cortex. Human Brain Mapping.
 Chadwick et al. (2010) Chadwick, M. J., Hassabis, D., Weiskopf, N., Maguire, E. A., 2010. Decoding individual episodic memory traces in the human hippocampus. Current Biology 20 (6), 544–7.
 Chen et al. (2011) Chen, Y., Namburi, P., Elliott, L. T., Heinzle, J., Soon, C. S., Chee, M. W. L., Haynes, J.D., 2011. Cortical surfacebased searchlight decoding. Neuroimage 56 (2), 582–92.
 Connolly et al. (2012) Connolly, A. C., Guntupalli, J. S., Gors, J., Hanke, M., Halchenko, Y. O., Wu, Y.C., Abdi, H., Haxby, J. V., 2012. The representation of biological classes in the human brain. Journal of Neuroscience 32 (8), 2608–18.
 Cox and Savoy (2003) Cox, D. D., Savoy, R. L., Jun 2003. Functional magnetic resonance imaging (fmri) ”brain reading”: detecting and classifying distributed patterns of fmri activity in human visual cortex. Neuroimage 19 (2 Pt 1), 261–70.
 Formisano and Kriegeskorte (2012) Formisano, E., Kriegeskorte, N., 2012. Seeing patterns through the hemodynamic veil  the future of patterninformation fMRI. Neuroimage 62 (2), 1249–56.
 Golomb and Kanwisher (2011) Golomb, J. D., Kanwisher, N., 2011. Higher level visual cortex represents retinotopic, not spatiotopic, object location. Cerebral Cortex Epub.
 Guyon et al. (2002) Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines. Machine learning 46 (1), 389–422.
 Hanke et al. (2009) Hanke, M., Halchenko, Y. O., Sederberg, P. B., Hanson, S. J., Haxby, J. V., Pollmann, S., 2009. PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics 7 (1), 37–53.
 Haxby et al. (2001) Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., Pietrini, P., 2001. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293 (5539), 2425–30.
 Haynes (2006) Haynes, J., 2006. Decoding mental states from brain activity in humans. Nature Reviews Neuroscience 7 (7), 523–534.
 Haynes et al. (2007) Haynes, J.D., Sakai, K., Rees, G., Gilbert, S., Frith, C., Passingham, R. E., Feb. 2007. Reading Hidden Intentions in the Human Brain. Current Biology 17 (4), 323–328.
 Jimura and Poldrack (2012) Jimura, K., Poldrack, R. A., 2012. Analyses of regionalaverage activation and multivoxel pattern information tell complementary stories. Neuropsychologia 50 (4), 544–52.

Johnson et al. (2009)
Johnson, J. D., McDuff, S. G. R., Rugg, M. D., Norman, K. A., 2009. Recollection, familiarity, and cortical reinstatement: a multivoxel pattern analysis. Neuron 63 (5), 697–708.
 Kaplan and Meyer (2012) Kaplan, J. T., Meyer, K., 2012. Multivariate pattern analysis reveals common neural patterns across individuals during touch observation. Neuroimage 60 (1), 204–12.
 Kriegeskorte et al. (2006) Kriegeskorte, N., Goebel, R., Bandettini, P., 2006. Informationbased functional brain mapping. Proceedings of the National Academy of Sciences 103 (10), 3863–8.
 Mitchell (1980) Mitchell, T., 1980. The need for biases in learning generalizations. Tech. Rep. CBMTR5110, Department of Computer Science, Rutgers University.
 Morgan et al. (2011) Morgan, L. K., Macevoy, S. P., Aguirre, G. K., Epstein, R. A., 2011. Distances between realworld locations are represented in the human hippocampus. Journal of Neuroscience 31 (4), 1238–45.
 Mur et al. (2008) Mur, M., Bandettini, P. A., Kriegeskorte, N., 2008. Revealing representational content with patterninformation fMRI–an introductory guide. Social Cognitive and Affective Neuroscience 4 (1), 101–109.
 Nestor et al. (2011) Nestor, A., Plaut, D. C., Behrmann, M., 2011. Unraveling the distributed neural code of facial identity through spatiotemporal pattern analysis. Proceedings of the National Academy of Sciences 108 (24), 9998–10003.
 Norman et al. (2006) Norman, K. A., Polyn, S. M., Detre, G. J., Haxby, J. V., Sep. 2006. Beyond mindreading: multivoxel pattern analysis of fMRI data. Trends in Cognitive Sciences 10 (9), 424–430.
 Oosterhof et al. (2012) Oosterhof, N. N., Tipper, S. P., Downing, P. E., Jan 2012. Viewpoint (in)dependence of action representations: An MVPA study. J Cogn Neurosci.
 Oosterhof et al. (2011) Oosterhof, N. N., Wiestler, T., Downing, P. E., Diedrichsen, J., May 2011. A comparison of volumebased and surfacebased multivoxel pattern analysis. Neuroimage 56 (2), 593–600.
 Oosterhof et al. (2010) Oosterhof, N. N., Wiggett, A. J., Diedrichsen, J., Tipper, S. P., Downing, P. E., Aug 2010. Surfacebased information mapping reveals crossmodal visionaction representations in human parietal and occipitotemporal cortex. Journal of Neurophysiology 104 (2), 1077–89.
 Peelen and Kastner (2011) Peelen, M. V., Kastner, S., Jul 2011. A neural basis for realworld visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences 108 (29), 12125–30.
 Pereira and Botvinick (2011) Pereira, F., Botvinick, M., May 2011. Information mapping with pattern classifiers: A comparative study. NeuroImage 56 (2), 476–496.
 Pereira et al. (2009) Pereira, F., Mitchell, T., Botvinick, M., Mar. 2009. Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 45 (1), S199–S209.
 Poldrack et al. (2009) Poldrack, R. A., Halchenko, Y. O., Hanson, S. J., Nov. 2009. Decoding the LargeScale Structure of Brain Function by Classifying Mental States Across Individuals. Psychological Science 20 (11), 1364–1372.
 Serences and Saproo (2012) Serences, J. T., Saproo, S., Mar 2012. Computational advances towards linking bold and behavior. Neuropsychologia 50 (4), 435–46.
 Soon et al. (2008) Soon, C. S., Brass, M., Heinze, H.J., Haynes, J.D., Apr. 2008. Unconscious determinants of free decisions in the human brain. Nature Neuroscience 11 (5), 543–545.
 Stokes et al. (2011) Stokes, M., Saraiva, A., Rohenkohl, G., Nobre, A. C., Jun 2011. Imagery for shapes activates positioninvariant representations in human visual cortex. Neuroimage 56 (3), 1540–5.
 Tong (2010) Tong, F., Dec. 2010. Pattern Classification Analysis. Annual Review of Psychology 63 (1), 110301102248092.

Wagner and Rissman (2010)
Wagner, A. D., Rissman, J., Dec. 2010. Distributed representations in memory: insights from functional brain imaging. Annual Review of Psychology 63 (1), 110301102248092.
 Wolpert (1996) Wolpert, D., 1996. The lack of a priori distinctions between learning algorithms. Neural computation 8 (7), 1341–1390.
 Woolgar et al. (2011) Woolgar, A., Thompson, R., Bor, D., Duncan, J., May 2011. Multivoxel coding of stimuli, rules, and responses in human frontoparietal cortex. Neuroimage 56 (2), 744–52.
Comments
There are no comments yet.