Enhancing Reproducibility: A Comprehensive Workflow for Integrating and Visualizing Biological Networks
Author(s): Florian Auer,Hryhorii Chereda,Júlia Perera-Bel,Frank Kramer
Affiliation(s): IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, University of Augsburg, Augsburg, Germany
Reliable research findings are the cornerstone of scientific progress but with the advancing reproducibility crisis, there is an increased need for the development of robust methodologies for the documentation of the used data and entire workflows. Especially network biology faces significant challenges in achieving reproducible research since diverse data types like protein interactions and gene expression from various resources are involved. This complexity makes it difficult to capture and share the exact data used and the specific steps taken for integration. Also, reproducibility often suffers when only final results are reported, neglecting the crucial intermediate steps and decision points throughout the analysis. This lack of transparency makes it challenging to understand the rationale behind the final outcome. Furthermore, visualizing biological network results is crucial for comprehending complex relationships between genes, proteins, or other entities, allowing researchers to identify patterns and trends that might be missed otherwise. Thereby, sharing visualizations within a collaborative project can be a significant hurdle. While static image formats offer a basic solution, they limit interactivity and exploration. Interactive visualization tools, on the other hand, can offer deeper insights. However, these tools often require specific software installations or technical expertise to navigate, potentially excluding collaborators less familiar with such platforms. This highlights the need for solutions that balance the power of interactive exploration with accessibility for all team members, regardless of their technical background. In this workflow, we present a suite of tools designed to facilitate reproducible network data integration and visualization. These tools demonstrate complementary approaches with a focus on different areas of the workflow, as well as different technical proficiency. Using a breast cancer dataset as a case study, we demonstrate the generation of patient-specific subnetworks, showcasing the practical application of our methodology. Key components of our approach include the Network Data Exchange (NDEx) platform for collaborative network sharing and the ndexr package for programmatic interaction with NDEx from within R. ndexr utilizes our RCX package for handling the biological network data, ensuring consistency in the data structure and visualization across platforms. The RCX package provides functions to create, modify, validate, visualize, and convert networks in the Cytoscape exchange (CX) format used for the transmission by NDEx to standard R data types and objects. RCX is compatible with objects of the igraph and Bioconductor graph packages, as well as with the RCy3 package for programmatic interaction with Cytoscape, a widely used network visualization tool with advanced features for network analysis and visualization. Furthermore, we incorporate web-based tools into the workflow to accompany the analysis and visualization. NDExEdit is a browser-based application for data-dependent network visualization, enabling users to interactively explore network attributes and apply visual styles based on the data. VisAVis, a tool for interactive exploration of integrated patient subnetworks. It allows for the comparison of networks between patient groups and facilitates the identification of relevant genes and pathways. Both are connected to the NDEx platform and are able to retrieve networks directly from there. This facilitates the sharing of network-based results between collaborators while preserving the privacy of the contained data through the NDEx user and visibility management. VisAVis also builds on extensions to the RCX packages which are provided through the RCX Extension Hub, which also offers support for custom CX aspects or additional functionality. By integrating these tools and platforms, we establish a robust and reproducible workflow for network data integration and visualization for breast cancer research. Data and its origin are documented within the integrated networks so that the performed steps can be traced back. Our approach emphasizes not only the reporting of results but also the documentation of the entire workflow, including intermediate steps. Thereby, the methodology promotes transparency, comprehensibility, and collaboration in network biology research, advancing our understanding of complex biological systems.