Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally optimized solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEMs), an fMRI data simulator that uses noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage the efficiencies of high-performance compute (HPC) clusters, and the same code can be seamlessly transferred from a laptop to a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have developed and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research.
Keywords: MVPA, fMRI analysis, high-performance computing, machine learning, fMRI simulator, tutorials
Cognitive neuroscientists have come a long way in using functional magnetic resonance imaging (fMRI) to help answer questions about cognitive processing in the brain. A variety of methods have been developed, ranging from univariate techniques to multivariate pattern analysis (MVPA) methods [1–4]. A large number of toolboxes are now available that implement these pattern analysis methods, including, for example the Princeton MVPA Toolbox , the Decoding Toolbox , CoSMoMVPA , Nilearn , and PyMVPA  (for a full list see https://github.com/ohbm/hackathon2019/blob/master/Tutorial_Resources.md). Scientists can choose which toolbox to use based on the analysis that they wish to perform and the programming language they wish to use.
In this work, we describe the Brain Imaging Analysis Kit (BrainIAK (RRID:SCR 014824), https://brainiak.org), an open-source Python package that implements computationally optimized solutions to key problems in advanced fMRI data analysis, focusing on analysis steps that take place after data have been preprocessed and put in matrix form. BrainIAK can be viewed as a “Swiss army knife” for advanced fMRI analysis, where we are constantly striving to add new tools. Presently, BrainIAK includes methods for running intersubject correlation (ISC)  and intersubject functional correlation (ISFC) [11, 12], functional alignment via the shared response model (SRM) , Bayesian Representational Similarity Analysis (RSA) [14, 15], event segmentation , dimensionality reduction via topographic factor analysis (TFA) , and inverted encoding models (IEMs) [18, 19].
To avoid duplication across packages, BrainIAK leverages available methods in other packages – it is well integrated with Nilearn (https://nilearn.github.io/index.html)  and extensively uses scikit-learn (https://scikit-learn.org/)  for machine learning algorithms. The functions in BrainIAK are optimized to run on high-performance compute (HPC) clusters for efficient execution on large datasets. The same code can be executed on a laptop or an HPC cluster, saving significant time in refactoring the code to run in an HPC environment. BrainIAK also includes a detailed set of tutorials  that are didactic in nature; the tutorials include very detailed steps and helper functions that facilitate learning and implementing some of the methods, including materials relevant to running on HPC clusters. Scientists can also use BrainIAK’s simulator  to create model-based patterns of activity at the voxel level, without going through the expensive and time-consuming process of data collection. The package is released with an open-source license and is free to use on a variety of platforms. The BrainIAK package welcomes contributions from the community, and new methods are continuously added to the package.
METHODS IN BRAINIAK
In the following sections, we present an overview of each of the methods presently included in BrainIAK and an accompanying example notebook. For each method, we list the data analysis problem that it is meant to solve and how it solves that problem. The notebooks also contain an annotated bibliography for each method, listing papers that have described and/or used this method. These example notebooks are not as didactic as the tutorials. Instead, the notebooks we provide here are integrated with the BrainIAK documentation, provide an overview of the technique, and allow users to quickly access code snippets for the method. Also, the notebooks include methods that are not covered in the tutorials such as Bayesian RSA [14, 15], TFA , IEMs [18, 19], BrainIAK’s simulator , and matrix-normal models . All example notebooks are available at https://github.com/brainiak/brainiak-aperture, along with instructions on how to run them.
The Problem: Measuring the Brain’s Response to Naturalistic Stimuli
One of the traditional goals of fMRI research is to measure the brain’s response to a particular stimulus, task, or other experimental manipulation. Typically, this approach relies on tightly controlled experimental designs – by contrasting two stimuli or tasks, or parametrically varying a particular experimental variable, we can isolate brain responses to the variable of interest. Experimentally isolating particular variables can reduce ecological validity; in response to this, cognitive neuroscientists have begun to adopt more naturalistic paradigms [25–30]. However, using naturalistic stimuli comes with its own set of challenges – in particular, if the stimuli are too complex to be modeled using a small set of regressors, the standard approach of relating a design matrix to the fMRI signal may not be practical.
ISC analysis takes a different approach to this problem – instead of trying to fully describe the stimulus in a design matrix, ISC measures stimulus-evoked responses to naturalistic stimuli by isolating brain activity shared across subjects receiving the same stimulus [10, 12]. When experimental participants are presented with a stimulus such as a movie or a spoken story, their brain activity can be conceptually decomposed into at least two components: (1) a stimulus-related component that is synchronized across subjects due to the use of a common stimulus; and (2) a subject-specific component capturing both idiosyncratic stimulus-related signals (e.g., unique memory and interpretation) and nonstimulus-related signals (e.g., physiological noise; Figure 1A). ISC analysis measures the former (shared, stimulus-related) component, filtering out the latter (idiosyncratic) component (Figure 1B).
This shared signal can be driven by different features of the stimulus in different brain regions. For example, when listening to a spoken story, ISC in early auditory areas may be driven by acoustic features of the stimulus, whereas ISC in the association cortex may be driven by higher-level linguistic features of the stimulus. In this sense, ISC is agnostic to the content of the stimulus and serves as a measure of reliability of stimulus-evoked responses across subjects (or as a “noise ceiling” for model-based prediction across subjects [12, 31, 32]). This is particularly useful for complex, naturalistic stimuli where exhaustively modeling stimulus features may be difficult. This also allows us to leverage naturalistic stimuli to ask novel questions about brain organization. For example, high ISCs extend from early auditory areas to high-level association cortices during story-listening. However, if we temporally scramble elements of the story stimulus, this disrupts the narrative content of the story; in this case, we still observe high ISC in early auditory areas, but less so in higher-level cortices, suggesting that certain association areas encode temporally evolving narrative content [33, 34].
Several variations on ISC have been developed at both the implementational and conceptual levels. For example, ISCs may be computed in either a pairwise or leave-one-out fashion, both of which have associated statistical tests [12, 35, 36]. An important conceptual advance has been to compute ISC across brain areas using ISFC analysis [11, Figure 1D]. ISFC analysis allows us to estimate functional connectivity (FC) networks analogous to traditional within-subject FC analysis (Figure 1C). However, unlike traditional within-subject FC analysis, ISFC analysis isolates stimulus-driven connectivity and is robust to idiosyncratic noise due to head motion and physiological fluctuations . Both ISC and ISFC can be computed using a sliding window to measure coarse fluctuations in the shared signal over time. Finally, rather than computing ISC on response time series, we can also apply the logic of ISC to multivoxel pattern analysis . Intersubject pattern correlation analysis captures spatially distributed shared response patterns across subjects at each time point (e.g., ). Computing spatial ISC between all time points (the spatial analogue of ISFC) enables us to discover whether certain spatial response patterns are consistent or reemerge over time .
The accompanying notebook applies ISC analysis to an example fMRI story-listening dataset from the “Narratives” data collection [39, 40]. To reduce computational demands, we compute ISC on a time series averaged within each parcel extracted from a functional cortical parcellation . We first demonstrate high ISC values extending from low-level auditory cortex to higher-level cortical areas during story listening. However, when listening to a temporally scrambled version of the stimulus, ISC is dramatically reduced in higher-level cortex areas, suggesting that these areas encode temporally evolving features of the stimulus (e.g., narrative context). We next perform a similar comparison between intact and scrambled story stimuli using traditional within-subject FC and ISFC analysis. The networks estimated using within-subject FC are similar across the two types of stimuli, while ISFC analysis yields very different networks for the intact and scrambled stories. BrainIAK also offers several nonparametric statistical tests for ISC and ISFC analysis, some of which are discussed in the notebook.
The computational demands of ISC/ISFC analyses scale with the number of subjects, voxels, and timepoints (TRs); however, the memory demands of pairwise ISC analysis will increase more precipitously with the number of subjects. A small-scale (e.g., parcellation-based) ISC analysis with 30 subjects, 1,000 parcels, and a 300-TR duration runs in a couple of seconds on a typical personal computer. On the other hand, whole-brain voxelwise ISC analysis with 50,000 voxels may require 10 or more minutes to run and require several GB of memory. For large-scale ISC analyses, we recommend running the analysis on a distributed computing cluster. Basic ISC/ISFC analysis (as implemented in BrainIAK) requires a single process to operate on data across all subjects. However, some additional preprocessing can allow for parallelization across subjects. For example, in the leave-one-out approach, precomputing the average time series excluding each subject can allow the ISC computation to proceed in parallel; in the pairwise approach, ISC for each pair of subjects can be computed in parallel and then recombined. Note that ISC analysis proceeds independently for each brain variable (e.g., voxel or parcel), so ISC analysis can also be parallelized across voxels; for example, a whole-brain voxelwise ISC analysis with 50,000 voxels can be divided into 50 parallel jobs each running ISC analysis on a subset of 1,000 voxels.
ISFC analysis computes the correlation between all pairs of parcels or networks, and therefore, computational demand increases primarily with the number of voxels. Similar to ISC analysis, smaller-scale analyses (e.g., 30 subjects, 1,000 parcels, and 300 TRs) are easily computed on a personal computer, whereas whole-brain voxelwise analyses may require a computing cluster.
Shared Response Model
The Problem: Aligning Brain Data across Participants
One of the main obstacles in leveraging brain activity across subjects is the considerable heterogeneity of functional topographies from individual to individual. Variability in functional–anatomical correspondence across individuals means that even high-performing anatomical alignment does not ensure fine-grained functional alignment (e.g., ). As an example, multivoxel pattern analysis models that perform well within subjects often degrade in performance when evaluated across subjects (e.g., [43, 44]).
SRM , alongside other methods of hyperalignment [45–47], aims to resolve this alignment problem by aligning on the basis of functional data. SRM estimation is driven by the commonality in functional responses induced by a shared stimulus (e.g., watching a movie). Unlike ISC analysis, which presupposes (often very coarse) functional correspondence, SRM isolates the shared response while accommodating misalignment across subjects. SRM decomposes multisubject fMRI data into a lower-dimensional shared space and subject-specific transformation matrices for projecting from each subject’s idiosyncratic voxel space into the shared space (Figure 2). Each of these topographic transformations effectively rotates and reduces each subject’s voxel space to find a subspace of shared features where the multivariate trajectory of responses to the stimulus is best aligned. These shared features do not correspond to individual voxels; rather, they are distributed across the full voxel space of each subject; each shared feature can be understood as a weighted sum of many voxels.
Transformations estimated from one subset of data can be used to project unseen data into the shared space. Projecting data into shared space increases both temporal and spatial ISC (by design), and in many cases improves between-subject model performance to the level of within-subject performance. Between-subject models with SRM can, in some cases, exceed the performance of within-subject models because (a) the reduced-dimension shared space can highlight stimulus-related variance by filtering out noisy or nonstimulus-related features, and (b) the between-subject model can effectively leverage a larger volume of data after functional alignment than is available for any single subject (e.g., [13, 48]). Denoised individual-subject data can be reconstructed by projecting data from the reduced-dimension shared space back into any given subject’s brain space. Furthermore, in cases where each subject’s unique response is of more interest than the shared signal, SRM can be used to factor out the shared component, thereby isolating the idiosyncratic response for each subject .
Building on the initial probabilistic SRM formulation [13, 49], several variants of SRM have been developed to address related challenges. For example, a fast SRM implementation has been introduced for rapidly analyzing large datasets with reduced memory demands . The robust SRM algorithm tolerates subject-specific outlying response elements , and the semisupervised SRM capitalizes on categorical stimulus labels when available . Finally, estimating the SRM from FC data rather than response time series circumvents the need for a single-shared stimulus across subjects; connectivity SRM allows us to derive a single-shared response space across different stimuli with a shared connectivity profile .
The accompanying notebook applies the SRM to an example fMRI story-listening dataset from the “Narratives” data collection . We apply the SRM within a temporal–parietal region of interest (ROI) comprising the auditory association cortex from a functional cortical parcellation  and explore the components of the resulting model. We evaluate the SRM using between-subject time-segment classification. This analysis reveals that the SRM yields a considerable improvement in between-subject classification beyond anatomical alignment.