Andrew Reid - Ph.D.

Blog posts

Intro

Reflections on multivariate analyses
Published on 2016-01-15 by Andrew Reid	#3

I've just spent two days talking machine learning concepts with some of the excellent folks behind the Nilearn Python project. We had some fruitful conversations, both about the possibilities and limitations of approaches implemented by the Nilearn package, and I wanted to get a few of these ideas down here.

Multivariate analyses

In contrast to the more classical univariate analysis (i.e., as implemented in SPM software) multivariate analysis (also known as multivariate pattern analysis, or MVPA), treats all voxels in an image or ROI as elements in a single statistical model. So, while SPM runs a single general linear model for each voxel independently, and subsequently deals with the multiple comparisons problem, MVPA puts all voxels in (for instance) a single GLM, and attempts to solve that in one shot. A typical problem looks like this:

\[ \mathbf{y} = \mathbf{\beta} \cdot \mathbf{X} + \varepsilon \]

In this equation, \(\mathbf{y}\) is our response vector, or the thing we want to predict. \(\mathbf{X}\) is our voxel-wise feature matrix, which can be an fMRI BOLD time series, or any other set of image intensity values we want to use to predict \(\mathbf{y}\). \(\beta\) are the model weights we are trying to fit to the data. As an example, suppose we have an experiment in which different categories of visual stimuli are presented, such as houses and faces. We acquire 2 mm resolution fMRI with 150 time points, while alternately presenting these stimuli. We want to know whether the fMRI BOLD response we observe while presenting these stimuli can be used to discriminate between them. This is known as a classification problem.

There are many approaches to classification, but our big problem is this: for each subject, we have far more features ( \(p > 100{,}000\) ) than we have time points ( \(n=150\) ). Such a problem is called ill-posed, because there are many more parameters ( \( \beta \) ) than there are observations ( \(\mathbf{y}\) ). Because of this, the usual estimation methods, such as ordinary least squares, are not applicable. Instead, we need to formulate the problem as an optimization problem, such that we are trying to minimize the error of the model prediction while also minimizing the complexity of the model. The general problem, known as linear least squares, looks like this:

\[ \mathop{\textrm{argmin}_{\beta}} ( {\left\lVert \mathbf{y} - \beta \cdot \mathbf{X} \right\rVert}_2^2 ) \]

in which we are trying to adjust the \(\beta\) weights in order to minimize the prediction error (\(\varepsilon\) in the above equation) between our observations \(\mathbf{y}\) and our model \(\beta \cdot X\), using the squared \(L_2\) norm, or Euclidean distance.

Regularization

Regularization is done to impose sparsity on our feature matrix \( \mathbf{X} \). In other words, since we assume that many, if not most, of our features (voxels) will not be discriminative for faces and houses, we can reduce the complexity of our problem by introducing a penalization term in our optimization problem. To reduce complexity, we want to minimize the sum of the \(\beta\) weights. Indeed, we would like many of them to be reduced to zero, and effectively eliminated from the model. Two well-known regularization approaches are ridge regression, using \(L_2\) norm:

\[ \mathop{\textrm{argmin}_{\beta}} ( {\left\lVert \mathbf{y} - \beta \cdot \mathbf{X} \right\rVert}_2^2 + \lambda \cdot {\left\lVert \beta \right\rVert}_2^2 ) \]

And LASSO, using the \(L_1\) norm:

\[ \mathop{\textrm{argmin}_{\beta}} ( {\left\lVert \mathbf{y} - \beta \cdot \mathbf{X} \right\rVert}_2^2 + \lambda \cdot {\left\lVert \beta \right\rVert}_1 ) \]

In both cases, we've introduced the tuning parameter \( \lambda \), which can be used to control the degree of sparsity we want to impose. They really only differ in the type of norm operation applied. Ridge regression, which uses the \(L_2\) norm, results in a smooth change in \(\beta\) weights over values of \(\lambda\), and as a result reduces them only to very small, but non-zero values. For LASSO on the other hand, the \(L_1\) norm reduces many \(\beta\) weights to zero, and tends to retain only a few very highly predictive voxels.

LASSO is a valuable approach in genome-wide association studies (GWAS), as it is often desirable to isolate a small subset of genetic sequences as predictive for a variable of interest. However, when applied to neuroimaging data, it will tend to penalize all but one voxel in a cluster with similar time series. In other words, it produces solutions which are too sparse to represent realistic neuroimaging patterns, where we expect spatial smoothness. On the other hand, ridge regression can produce smooth patterns, but it does not introduce sparsity since most \(\beta\) weights still have non-zero values.

The compromise for this is called the elastic net. It is, quite simply, a weighted average of these two regularization terms, in which the trade-off between smoothness and sparsity is expressed as the parameter \(\alpha\):

\[ \mathop{\textrm{argmin}_{\beta}} ( {\left\lVert \mathbf{y} - \beta \cdot \mathbf{X} \right\rVert}_2^2 + \lambda \cdot [ (1-\alpha) \cdot {\left\lVert \beta \right\rVert}_1 + \alpha \cdot {\left\lVert \beta \right\rVert}_2^2 ] ) \]

An extension of the elastic net approach, called GraphNet, further manipulates the regularization term in order to explicitly impose spatial smoothness on the solution.

Cross-validation

Another big problem with fitting our ill-posed multivariate linear model is that of over-fitting, in which we find a solution that fits the training data well, but this performance doesn't generalize to other independent samples. Since generalizability is the goal of any empirical research, we need to address the issue of over-fitting. The standard solution for this is to use cross-validation, in which we use independent subsets of our sample, or folds, as "training" and "testing" samples. In \(k\)-fold cross-validation, we divide the sample into \(k\) folds, and for each fold we use the remaining data to fit the elastic net model, and then derive a model error from the fold itself. An average of these prediction errors then gives a measure of the generalizability of the model.

In practice, we also want to perform this cross-validation procedure across different values of our tuning parameters \(\alpha\) and \(\lambda\). This is called nested cross-validation, and obtaining prediction errors across many values of these parameters can be used to select the best model for our classification purpose.

Some caveats about interpretation

Say you've collected your data, run it through Nilearn's GraphNet optimization, and obtained a cross-validated predictive model with which you can discern whether the input was faces or houses, with 90% accuracy. That's pretty cool. But as a cognitive neuroscientist, you want to dig deeper. What can this model tell us about the neural mechanisms of this discrimination? The obvious route to answering this question is to look at the \(\beta\) weights, since there is one per voxel, and higher weights clearly contribute more to the model prediction, while zero weights contribute nothing. Mapping these values back to our image, we might get a result that looks like:

SpaceNet classification result

The trick is how to interpret this. We would like to be able to make statements such as "the yellow voxels represent regions for which the brain is activated preferentially for faces than for houses". But we cannot actually derive such an inference from our results. Why? In contrast to univariate methods, where every voxel is modelled independently, our multivariate result cannot be interpreted in terms of solitary voxels. Because each \(\beta\) is dependent on the others in the model, removing one voxel from the model will affect the optimal result for all the others, and it is the combination of information across terms that leads to the model's predictive power. We can say that the voxels with high absolute \(\beta\) weights have more discriminatory power for faces versus houses, but we can't go much further on the basis of these results alone.

Some remedies for this limitation? Firstly, it is always a good idea to consider complementary analyses. For instance, while univariate models do not have the same predictive power as multivariate, they do have the advantage of permitting inferences on single voxels. So, if we ran a parallel univariate analysis in which we identified some of the same voxels as being more strongly activated for faces over houses, we can present these results in combination and support more interesting inferences about mechanisms of this perceptual function. It is also possible to use the results of the multivariate analyses to define a ROI for the univariate, with one major caveat: to avoid the "double dipping" conundrum, we would have to do this on an independent sample (e.g., a split-half design, if your sample size is large enough for this to be feasible).

Another possibility is the use of a searchlight analysis. Here, we define spheres of a certain radius around a specific voxel, and perform our multivariate analysis only for voxels in this sphere. This "searchlight" can then be applied to all voxels of interest, and the resulting predictive accuracy of each sub-model can be assigned to its center voxel. This approach provides somewhat of a hybrid between multi- and univariate approaches, in that it can be used to support stronger statements about individual voxels (and their immediate neighbours), but still retain some of the power of the multivariate approach. It is justified by the assumption of smoothness in neuroimaging data, although the ideal searchlight radius parameter is unknown (but can conceivably be estimated through model selection in a similar manner to \(\lambda\) and \(\alpha\), above).

In conclusion, this method was exciting to learn about, and the Nilearn package provides a very user-friendly and well-documented means of performing these sorts of analyses on your data. I certainly plan to incorporate these methods into my future research designs.

Comments here

Machine learning approaches to neuroimaging analysis offer promising solutions to research questions in cognitive neuroscience. Here I reflect on recent interactions with the developers of the Nilearn project.

Tags:MVPA · Machine learning · Nilearn · Elastic net · Statistics · Stats

Causal discovery: An introduction
Published on 2024-09-23 by Andrew Reid	#21

This post continues my exploration of causal inference, focusing on the type of problem an empirical researcher is most familiar with: where the underlying causal model is not known. In this case, the model must be discovered. I use some Python code to introduce the PC algorithm, one of the original and most popular approaches to causal discovery. I also discuss its assumptions and limitations, and briefly outline some more recent approaches. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Causality · Causal inference · Causal discovery · Graph theory · Teaching

Causal inference: An introduction
Published on 2023-07-17 by Andrew Reid	#20

Hammer about to hit a nail, representing a causal event.

In this post, I attempt (as a non-expert enthusiast) to provide a gentle introduction to the central concepts underlying causal inference. What is causal inference and why do we need it? How can we represent our causal reasoning in graphical form, and how does this enable us to apply graph theory to simplify our calculations? How do we deal with unobserved confounders? This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Causality · Causal inference · Graph theory · Teaching

Multiple linear regression: short videos
Published on 2022-08-10 by Andrew Reid	#19

In a previous series of posts, I discussed simple and multiple linear regression (MLR) approaches, with the aid of interactive 2D and 3D plots and a bit of math. In this post, I am sharing a series of short videos aimed at psychology undergraduates, each explaining different aspects of MLR in more detail. The goal of these videos (which formed part of my second-year undergraduate module) is to give a little more depth to fundamental concepts that many students struggle with. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Linear regression · Teaching

Learning about multiple linear regression
Published on 2021-12-30 by Andrew Reid	#18

In this post, I explore multiple linear regression, generalizing from the simple two-variable case to three- and many-variable cases. This includes an interactive 3D plot of a regression plane and a discussion of statistical inference and overfitting. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Linear regression · Teaching

Learning about fMRI analysis
Published on 2021-06-24 by Andrew Reid	#17

In this post, I focus on the logic underlying statistical inference based on fMRI research designs. This consists of (1) modelling the hemodynamic response; (2) "first-level" within-subject analysis of time series; (3) "second-level" population inferences drawn from a random sample of participants; and (4) dealing with familywise error. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · FMRI · Hemodynamic response · Mixed-effects model · Random field theory · False discovery rate · Teaching

Learning about simple linear regression
Published on 2021-03-25 by Andrew Reid	#16

In this post, I introduce the concept of simple linear regression, where we are evaluating the how well a linear model approximates a relationship between two variables of interest, and how to perform statistical inference on this model. This is part of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology.

Tags:Stats · Linear regression · F distribution · Teaching

New preprint: Tract-specific statistics from diffusion MRI
Published on 2021-03-05 by Andrew Reid	#15

In our new preprint, we describe a novel methodology for (1) identifying the most probable "core" tract trajectory for two arbitrary brain regions, and (2) estimating tract-specific anisotropy (TSA) at all points along this trajectory. We describe the outcomes of regressing this TSA metric against participants' age and sex. Our hope is that this new method can serve as a complement to the popular TBSS approach, where researchers desire to investigate effects specific to a pre-established set of ROIs.

Tags:Diffusion-weighted imaging · Tractography · Connectivity · MRI · News

Learning about correlation and partial correlation
Published on 2021-02-04 by Andrew Reid	#14

This is the first of a line of teaching-oriented posts aimed at explaining fundamental concepts in statistics, neuroscience, and psychology. In this post, I will try to provide an intuitive explanation of (1) the Pearson correlation coefficient, (2) confounding, and (3) how partial correlations can be used to address confounding.

Tags:Stats · Linear regression · Correlation · Partial correlation · Teaching

Linear regression: dealing with skewed data
Published on 2020-11-17 by Andrew Reid	#13

One important caveat when working with large datasets is that you can almost always produce a statistically significant result when performing a null hypothesis test. This is why it is even more critical to evaluate the effect size than the p value in such an analysis. It is equally important to consider the distribution of your data, and its implications for statistical inference. In this blog post, I use simulated data in order to explore this caveat more intuitively, focusing on a pre-print article that was recently featured on BBC.

Tags:Linear regression · Correlation · Skewness · Stats

Functional connectivity as a causal concept
Published on 2019-10-14 by Andrew Reid	#12

In neuroscience, the conversation around the term "functional connectivity" can be confusing, largely due to the implicit notion that associations can map directly onto physical connections. In our recent Nature Neuroscience perspective piece, we propose the redefinition of this term as a causal inference, in order to refocus the conversation around how we investigate brain connectivity, and interpret the results of such investigations.

Tags:Connectivity · FMRI · Causality · Neuroscience · Musings

Functional connectivity? But...
Published on 2017-07-26 by Andrew Reid	#11

Functional connectivity is a term originally coined to describe statistical dependence relationships between time series. But should such a relationship really be called connectivity? Functional correlations can easily arise from networks in the complete absence of physical connectivity (i.e., the classical axon/synapse projection we know from neurobiology). In this post I elaborate on recent conversations I've had regarding the use of correlations or partial correlations to infer the presence of connections, and their use in constructing graphs for topological analyses.

Tags:Connectivity · FMRI · Graph theory · Partial correlation · Stats

Driving the Locus Coeruleus: A Presentation to Mobify
Published on 2017-07-17 by Andrew Reid	#10

How do we know when to learn, and when not to? Recently I presented my work to Vancouver-based Mobify, including the use of a driving simulation task to answer this question. They put it up on YouTube, so I thought I'd share.

Tags:Norepinephrine · Pupillometry · Mobify · Learning · Driving simulation · News

Limitless: A neuroscientist's film review
Published on 2017-03-29 by Andrew Reid	#9

In the movie Limitless, Bradley Cooper stars as a down-and-out writer who happens across a superdrug that miraculously heightens his cognitive abilities, including memory recall, creativity, language acquisition, and action planning. It apparently also makes his eyes glow with an unnerving and implausible intensity. In this blog entry, I explore this intriguing possibility from a neuroscientific perspective.

Tags:Cognition · Pharmaceuticals · Limitless · Memory · Hippocampus · Musings

The quest for the human connectome: a progress report
Published on 2016-10-29 by Andrew Reid	#8

The term "connectome" was introduced in a seminal 2005 PNAS article, as a sort of analogy to the genome. However, unlike genomics, the methods available to study human connectomics remain poorly defined and difficult to interpret. In particular, the use of diffusion-weighted imaging approaches to estimate physical connectivity is fraught with inherent limitations, which are often overlooked in the quest to publish "connectivity" findings. Here, I provide a brief commentary on these issues, and highlight a number of ways neuroscience can proceed in light of them.

Tags:Connectivity · Diffusion-weighted imaging · Probabilistic tractography · Tract tracing · Musings

New Article: Seed-based multimodal comparison of connectivity estimates
Published on 2016-06-24 by Andrew Reid	#7

Our article proposing a threshold-free method for comparing seed-based connectivity estimates was recently accepted to Brain Structure & Function. We compared two structural covariance approaches (cortical thickness and voxel-based morphometry), and two functional ones (resting-state functional MRI and meta-analytic connectivity mapping, or MACM).

Tags:Multimodal · Connectivity · Structural covariance · Resting state · MACM · News

Four New ANIMA Studies
Published on 2016-03-18 by Andrew Reid	#6

Announcing four new submissions to the ANIMA database, which brings us to 30 studies and counting. Check them out if you get the time!

Tags:ANIMA · Neuroscience · Meta-analysis · ALE · News

Exaptation: how evolution recycles neural mechanisms
Published on 2016-02-27 by Andrew Reid	#5

Exaptation refers to the tendency across evolution to recycle existing mechanisms for new and more complex functions. By analogy, this is likely how episodic memory — and indeed many of our higher level neural processes — evolved from more basic functions such as spatial navigation. Here I explore these ideas in light of the current evidence.

Tags:Hippocampus · Memory · Navigation · Exaptation · Musings

The business of academic writing
Published on 2016-02-04 by Andrew Reid	#4

Publishers of scientific articles have been slow to adapt their business models to the rapid evolution of scientific communication — mostly because there is profit in dragging their feet. I explore the past, present, and future of this important issue.

Tags:Journals · Articles · Impact factor · Citations · Business · Musings

New ANIMA study: Hu et al. 2015
Published on 2016-01-11 by Andrew Reid	#2

Announcing a new submission to the ANIMA database: Hu et al., Neuroscience & Biobehavioral Reviews, 2015.

Tags:ANIMA · Neuroscience · Meta-analysis · ALE · Self · News

Who Am I?
Published on 2016-01-10 by Andrew Reid	#1

Musings on who I am, where I came from, and where I'm going as a Neuroscientist.

Tags:Labels · Neuroscience · Cognition · Musings

Andrew Reid PhD

Multivariate analyses

Regularization

Cross-validation

Some caveats about interpretation