See uploaded file ‘response’ - I submit here a response and the updated manuscript using the new platform (i.e. appearing as new submission). The handling editor Alexandre Gramfort.
ElectroEncephaloGraphy robust statistical linear modelling using a single weight per trial
Cyril Pernet1,2, Guillaume Rousselet3, Ignacio Suay Mas1,
Ramon Martinez4, Rand Wilcox5 & Arnaud Delorme4,6,7
1 Centre for Clinical Brain Sciences, University of Edinburgh, United Kingdom, 2 Neurobiology Research Unit, Copenhagen University Hospital, Rigshospitalet, Denmark, 3 Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom, 4 Swartz Center for Computational Neurosciences, University of California, United States of America, 5 Department of Psychology, University of Southern California, United States of America, 6 Centre de Recherche Cerveau et Cognition, Université Toulouse III—Paul Sabatier, Toulouse, France, 7 Centre National de la Recherche Scientifique, Centre de Recherche Cerveau et Cognition, Toulouse, France
Being able to remove or weigh down the influence of outlier data is desirable for any statistical model. While Magnetic and ElectroEncephaloGraphic (MEEG) data are often averaged across trials per condition, it is becoming common practice to use information from all trials to build statistical linear models. Individual trials can, however, have considerable weight and thus bias inferential results (effect sizes as well as thresholded t/F/p maps). Here, rather than looking for univariate outliers, defined independently at each measurement point, we apply the principal component projection (PCP) method at each channel, deriving a single weight per trial at each channel independently. Using both synthetic data and open EEG data, we show (1) that PCP is efficient at detecting a large variety of outlying trials; (2) how PCP-based weights can be implemented in the context of the general linear model (GLM) with accurate control of type 1 family-wise error rate; and (3) that our PCP-based Weighted Least Square (WLS) approach increases the statistical power of group analyses as well as a much slower Iterative Reweighted Least Squares (IRLS), although the weighting scheme is markedly different. Together, our results show that WLS based on PCP weights derived from whole trial profiles is an efficient method to weigh down the influence of outlier EEG data in linear models.
Keywords: ElectroEncephaloGraphy, single trials, statistical outliers, Weighted Least Squares, General Linear Model
Data availability: all data used are publicly available (CC0), all code (simulations and data analyzes) is also available online in the LIMO MEEG GitHub repository (MIT license).
MEEG data analysis can be subdivided into two main parts: preprocessing and processing. Data preprocessing corresponds to any manipulation and transformation of the data, such as artefacts removal and attenuation or change in data representations (spectral transformation, source modelling). Data processing refers to mathematical procedures that do not change the data, i.e., statistical analysis and statistical modelling (1). Here, we present a method that improves data processing by weighting down trials that have different dynamics from the bulk of the data, within the context of an experimental design. We investigated single-trial weighting for full scalp/channel statistical linear hierarchical modelling, providing a more robust statistical analysis method to the community.
After data preprocessing, MEEG data are typically epoched to form 3 or 4-dimensional matrices of, e.g., channel x time x trials and channel x frequency x time x trials. Several neuroimaging packages are dedicated to the statistical analyses of such large multidimensional data, often using linear methods. For instance, in the LIMO MEEG toolbox (2), each channel, frequency, and time frame is analyzed independently using the general linear model, an approach referred to as mass-univariate analysis. Ordinary Least Squares (OLS) are used to find model parameters that minimize the error between the model and the data. For least squares estimates to have good statistical properties, it is however expected that the error covariance off-diagonals are zeros, such that Cov(e) = σ2I, I being the identity matrix (3), assuming observations are independent and identically distributed. It is well established that deviations from that assumption lead to substantial power reduction and to an increase in the false-positive rate. When OLS assumptions are violated, robust techniques offer reliable solutions to restore power and control the false positive rate. Weighted Least Squares (WLS) is one such robust method that uses different weights across trials, such that Cov(e) = σ2V, with V a diagonal matrix:
When applied to MEEG data, a standard mass-univariate WLS entails obtaining a weight for each trial but also each dimension analyzed, i.e. channels, frequencies and time frames. Following such procedure, a trial could be considered as an outlier or be assigned a low weight, for a single frequency or time frame, which is implausible given the well-known correlations of MEEG data over space, frequencies and time. We propose here that a single or a few consecutive data points should never be flagged as outliers or weighted down, and that a single weight per trial (and channel) should be derived instead, with weights taking into account the whole temporal or spectral profile. In the following, we demonstrate how the Principal Component Projection method (PCP (4)) can be used in this context, and how those weights can then be used in the context of the general linear model (WLS), applied here to event-related potentials. Such weighting should not be taken to bypass data preprocessing. Measurements and physiological artefacts are typically several orders of magnitude stronger than the signal of interest, spanning consecutive trials, and require dedicated algorithms. After data preprocessing, residual signal artefacts might however remain. In addition, natural and unintended experimental variations might also exist among all trials, and the weighting scheme aims to account for these.
Trial-based Weighted Least Squares
The new WLS solution consists of computing 1st level GLM beta estimates using weights from the Principal Component Projection algorithm (PCP - (4)). As such, the estimation procedure follows the usual steps of weighting schemes, like a standard Iterative Reweighted Least Squares (IRLS) solution:
After the OLS solution is computed, an adjustment is performed on residuals by multiplying them by
1/1-hwhere h is a vector of Leverage points (i.e. the diagonal of the hat matrix H = X(X'X)–1X'where X is the design matrix reflecting experimental conditions). This adjustment is necessary because leverage points are the most influential on the regression space, i.e. they tend to have low residual values (5).
Residuals are then standardized using a robust estimator of dispersion, the median absolute deviation to the median (MAD), and re-adjusted by the tuning function. Here we used the bisquare function. A series of weights is next obtained either based on distances (PCP method) or minimizing the error term (IRLS) with, in both cases, high weights for data points having high residuals (with a correction for Leverage).
The solution is then computed following equation 3.
Unlike IRLS, WLS weights are not derived for each time frames independently but by looking at the multivariate distances in the principal component space (step 2 above) and thus deriving a unique weight for all time frames. An illustration of the method is shown in figure 1. Trial weights are computed as a distance among trials projected onto the main (> = 99%) principal components space. Here, the principal components computed over the f time frames are those directions that maximize the variance across trials for uncorrelated (orthogonal) time periods (figure 1B). Outlier trials are points in the f-dimensional space which are far away from the bulk. By virtue of the PCA, these outlier trials become more visible along the principal component axes than in the original data space. Weights (figure 1E) for each trial are obtained using both the Euclidean norm (figure 1C, distance location) and the kurtosis weighted Euclidean norm (figure 1D, distance scatter) in this reduced PCA space (see (4) for details). Weights are then applied to each trial (at each data point), on the original data (i.e. there is no data reduction). We exploit this simple technique because it is computationally fast given the rich dimensional space of EEG data and because it does not assume the data to originate from a particular distribution. The PCP algorithm is implemented in the limo_pcout.m function, and the WLS solution in the limo_WLS.m function, both distributed with the LIMO MEEG toolbox (https://limo-eeg-toolbox.github.io/limo_meeg/)..