SAMURAI: Shallow Analysis of copy nuMber alterations Using a Reproducible And Integrated bioinformatics pipeline

SAMURAI: Shallow Analysis of copy nuMber alterations Using a Reproducible And Integrated bioinformatics pipeline


Author(s): Sara Potente,Sergio Marchini,Dino Paladin,Diego Boscarino,Luca Beltrame,Chiara Romualdi

Affiliation(s): University of Padua



Introduction: Shallow Whole Genome Sequencing (sWGS) has become a cost-effective method for genomic analysis, particularly in identifying copy number alterations (CNAs). However, the lack of standardized pipelines for sWGS data analysis presents a significant challenge towards robustness and reproducibility of the results. To address this gap, we have developed SAMURAI (Shallow Analysis of copy nuMber alterations Using a Reproducible And Integrated bioinformatics pipeline). Methods: SAMURAI was created following the existing nf-core community guidelines and consists of a series of independent blocks covering data pre-processing, copy number analysis, and reporting. The pipeline is fully containerized to ensure reproducibility and scalability. Results: SAMURAI pre-processing module manages data preparation; then users can select between solid or liquid biopsy (e.g. circulating tumor DNA) analysis, with customized tools for each sample type. The pipeline follows the best practices and uses state-of-the-art algorithms to detect CNAs. We included in SAMURAI different packages for copy number analysis from low-pass whole genome sequencing, mostly R packages (QDNAseq [1], ASCAT.sc [2], CINSignatureQuantification [3], ichorCNA [4], Maftools [5]) and custom scripts in order to rearrange and harmonize tools outputs and to be ready for the downstream analysis. Then, a final report provides detailed results useful for data interpretation and potential downstream analyses. The effectiveness of the pipeline was confirmed through tests on simulated and real clinical samples. Discussion: SAMURAI allows researchers to effectively detect copy number alterations from sWGS data, especially in cancer research, enhancing the understanding of disease biology and potential therapeutic targets. Moreover, since the core is the copy number calling block, SAMURAI could be a step-by-step guide to analyzing copy number alterations from sWGS data using R/Bioconductor. Conclusion: In conclusion, SAMURAI offers a reliable and versatile pipeline for analyzing CNAs from sWGS data. Its adherence to community guidelines and containerized software ensures reproducibility and scalability, making it an helpful tool for researchers in diverse environments, and potentially contributing to advancements in precision medicine. References 1. Scheinin I, Sie D, Bengtsson H, et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res. 2014; 24:2022–2032 2. VanLoo P. ASCAT.sc. 2021; 3. Drews RM, Hernando B, Tarabichi M, et al. A pan-cancer compendium of chromosomal instability. Nature 2022; 606:976–983 4. Adalsteinsson VA, Ha G, Freeman SS, et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 2017; 8:1324 5. Mayakonda A, Lin D-C, Assenov Y, et al. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018; 28:1747–1756