Trajectory-based differential expression analysis for single cell proteomics data with msqrob2

Trajectory-based differential expression analysis for single cell proteomics data with msqrob2


Author(s): Stijn Vandenbulcke,Christophe Vanderaa,Lieven Clement

Affiliation(s): UGent



Many biological processes are dynamic, e.g. cell differentiation, tissue development, and responses to external stimuli. Traditionally they were studied with time-course experiments. With the advent of single cell technologies, however, they can also be unraveled by taking a snapshot of the transcriptome or proteome of hundreds to millions of single cells in a cell population, which are each at distinct points in a dynamic process. Trajectory inference methods can then be used to infer lineages and pseudotimes in a reduced dimensional space. An important step in the data analysis is to prioritise features (genes, transcripts, peptides or proteins) that are associated with time or pseudo-time as well as features for which the trends differ upon stimulation or between lineages. Flexible models are needed for this purpose as the trends are often non-linear. In this respect, we developed tradeseq, a bioconductor package to fit flexible negative binomial additive models to infer genes that are differentially expressed along a scRNA-seq trajectory. With smoothers we model the trends along (pseudo)time data driven and allow for inferring on the association with (pseudo)time as well as on trend differences between lineages or upon stimulation. These functionalities, however, have to be ported to single cell proteomics. This seems very natural in msqrob2 by exploiting the link between smoothers, ridge regression and mixed models. In this contribution we present our msqrob2 extension that enables trend estimation in time-course designs and post-trajectory inference through the implementation of smoothers. By building upon the existing capabilities of msqrob2, i.e. robust ridge regression and mixed models the users can also address the complex hierarchical correlation structure of SCP data, which allows them to analyse complex experiments and obtain a comprehensive view of their bulk and single cell proteomics data. We illustrate our method in a simulation study and a real single cell proteomics dataset (schoof, 2021) to prioritise proteins involved in cellular differentiation in an acute myeloid leukaemia model.