
Multi-omics approach overcomes the limitation of single-omics layer analysis and enables comprehensive understanding through multiple perspectives. In addition to the field of basic biology, it is widely used in diagnosis, prognosis prediction, and mechanism research crucial for treatment through clinical information (1, 2). Recently, many synergistic effects have been demonstrated by combining not only genome, epigenome, and transcriptome, but also single cells and artificial intelligence (3, 4). Thus, a lot of methodologies for integrating omics data have been developed (5-7).
Next Generation Sequencing (NGS) has changed the paradigm of biology along with microarray over the past decade (8). Multi-omics research has been further accelerated with the development of these sequencing and array-based technologies. Up to now, various types of omics data are produced through these high throughput sequencing or array methods and being applied to the latest healthcare sector (9-15). However, data produced in this way are very large and difficult to handle, making it hard to share or reuse. Therefore, platforms for efficiently managing them and integrating data of various characteristics are being developed in various ways (16). In the early days, the development of analysis tools was mainly focused on single-omics data type. However, recently, as various and large amounts of omics data are produced, web-based bioinformatic platforms that can integrate inter-omics analysis have been developed for complex biological interpretation (17-20).
Based on omics data (transcriptome, epigenome, and genome), we constructed a Multi-Omics Analysis Sandbox Toolkit (MOAST) to flexibly link clinical information and omics data, fine-tune results through iterative operations, and ultimately facilitate the discovery of clinically relevant biological pathways or gene sets as biomarker candidates. For reproducibility and ease of collaboration, we included functionalities to generate the configuration of analysis schemes and efficiently share them between users. This Multi-Omic Analysis Sandbox Toolkit can be accessed at http://analysis.moast.xyz. It will serve as a precious tool to grasp the impact of diverse elements on target diseases and swiftly uncover potential biomarkers candidates.
The Multi-Omic Analysis Sandbox Toolkit is a web server platform developed for the exploration, integrated analysis, and visualization of datasets ranging from single-omics to multi-omics. This Toolkit enables easy analysis with a mouse click. It provides insight into finding omics marker candidates having a high correlation between various clinical groups of interest. After selecting data to be analyzed based on clinical information of interest, this platform can perform expression, methylation, and variant analysis, or proceed with a user-defined set such as a gene symbol list.
Overall conceptual structures and flows of the software are illustrated in Fig. 1. The “A” module is a function for users to select samples and designate groups using clinical data. Groups can be formed freely by applying various filters based on various information. Sample groups can be stored and used in “PRESET” units. Module “B” is a step of selecting a gene set through omics analysis or user-defined selection. The selected gene set is stored as a unit called “study” and used for various analyses and sharing within the platform. The “C1” module can derive a gene list that is the result for each option for three omics (expression, methylation, and mutation). For each result, the meaning can be grasped intuitively using various visualization tools. Results of each option are stored in a visualization or tabular format. They are easy to compare and analyze with other results. The “C2” module is the process of constructing a gene list from user interested or public databases such as the GO term or KEGG pathway. In the “add_gene_set” section, users can change selected genes to reorganize desired gene list by survival analysis in previously defined samples. Module “D” is a visualization module that analyzes single omics analysis and gene sets imported via the “add_gene_set” to visually compare gene sets within the same “preset”. Users can set cut-offs to easily verify user-calculated single omics results and figures for additional data sets in public databases. Even with different gene sets, users can perform comparative analysis within the same preset, group, or sample and drag-and-drop to determine specific expression, methylation, mutation, and so on for a specific gene set. The “E” function is a module that allows users to select one data set they wish to analyze if one preset (group, sample information) contains multiple studies (gene sets). The “F” is a module for correlation between single omics among gene sets selected in module “E”. Based on the expression value, the user can multi-dimensionally check the significance by methylation ratio, presence or absence of mutation, and other factors. Of remaining auxiliary modules, the “G” module provides functionalities to generate scores based on multi-omics data and perform survival analysis to derive the clinical significance of biomarker candidates. Lastly, “H1” and “H2” modules are designed for primer design and recording of primers, respectively, which are linked to a webpage to design mutation marker PCR primers for ARMS-blocker-Tm PCR (21) based on PRIMER3 (22) and methylation marker PCR primers based on MSP-HTPrimer (23).
Online webserver implementation of the Multi-Omics Analysis Sandbox Toolkit is currently made available at http://analysis.moast.xyz. Currently, gene expression, DNA methylation, and annotated mutation data from TCGA COAD (n = 50) and READ (n = 50) cohort are loaded with clinical data including histological types (Colon Adenocarcinoma, Colon Mucinous Carcinoma, Rectal Adenocarcinoma, and Rectal Mucinous Carcinoma), sexes, age at initial pathologic diagnosis, and duration of disease-free survival.
By selecting “Data selection” in the “Analysis” menu, users can check the basic statistics of clinical and genetic information registered by the user. Through the “Data Sets” table, samples can be selected based on the clinical information of interest and assigned to the desired group (Fig. 2A). Samples by groups can be checked in “Data Set Configuration” panel at the bottom of the page. Group contents can be saved or deleted with group buttons. After setting the group, for the selected group from the top panel, users can also check clinical statistics in the top panel (Fig. 2B). With the “Next” button, the next step (single-omics or gene set analysis) can be processed based on information of the set/confirmed group, which will be dealt with in more detail in the following section.
After data selection and group assignment, each single omics analysis (expression, methylation, and variant) could be executed by selecting the “Single Omics analysis” in a pop-up window. Expression analysis for DEG can be performed with count or TPM normalized unit using EdgeR or DESeq2 (24, 25). After that, the list of candidates can be checked for each of various options. The pattern of the result can be intuitively grasped through various visualization tools, such as a volcano plot (Fig. 3A). By the methylation tab at the top region, DMP analysis with methylation data is executed by ChAMP program (26). Resulting DEGs or DMPs can be visualized as heatmaps after applying filtering criteria (Fig. 3B). In the Variant tap, SNV or indel result is provided with Oncoprint style (Fig. 3C) and Kaplan-Meier Plot. Once omics analysis is complete, visualization plots for candidates are displayed at the bottom of the screen by each omics data. Annotation information corresponding to each can be checked. Each result table and visualized plot can be downloaded and links for further examination of individual genes and GO analysis are presented (Fig. 3D). In the case of methylation and mutational variant, primer design based on ARMS-blocker-Tm PCR or MSP-HTPrimer can be done after annotation and used for experimental validation (Supplementary Fig. 1). Additionally, analysis of gene sets of interest can be performed through the “Add gene Set” in the “Gene Set selection” pop-up menu. For geneset analysis, a gene set of interest can be entered in the Geneset List window or a gene list for a desired term from a public DB such as GO (Gene_Ontology) can be used as input. Resulting gene sets can be simultaneously visualized on a single page for a swift comparison between gene sets. To this end, we added functionality to directly transfer the order of samples as a result of clustering of an omics type to other omics types and gene sets (Fig. 3E). Moreover, correlation analysis between multi-omics data using the preset/study module allows users to uncover potential mechanisms of genetic and epigenetic regulations underlying gene expression differences (Fig. 3F).
To facilitate the discovery of potential biomarker candidates and marker panels that are clinically relevant, we made a module dedicated to score calculations and survival analysis. This module can be accessed from the “C2” module, a user-defined gene set module (Fig. 4A). With a selected custom gene set, users can load omics data to the workspace in the form of a spreadsheet (Fig. 4B). Scores based on custom formulas can be easily added utilizing functions of the spreadsheet. Additionally, built-in functions to calculate principal components and risk score (27) calculations are supported. Together with expression, methylation, or mutation of a single feature (gene/probe/mutation), these scores or data values can then be subjected to Kaplan-Meier survival analysis (Fig. 4C).
Sample and group information are saved in the form of a preset, making it possible to load the dataset without selecting multiple samples again during reanalysis. The gene set derived from omics analysis or directly selected by the user is then saved in the form of a study. Results for each gene set can be compared in the form of multiple studies within the same preset. Patterns for each data can be checked at a glance. Even when conducting single omics analysis for each data after selecting a gene set, information on programs and analysis options can be saved. Analysis methods and explanations for the gene set can be recorded by adding comments. In addition, visualization results such as plots generated through various results can be individually saved in JSON format and restored to compare with data from other studies in the preset.
Presets and studies can be shared with other users in addition to the user who created the data. The data owner can share each preset with a desired user. The shared user can also check the shared sample list, group information, and program options. To perform iterative analysis, these functions can be reused in the “Work History” tab. It is not difficult to understand the structure of the data because data are connected in the form of a tree in the order of preset-study-omics. It is possible to save and re-analyze as a new preset after performing new analysis addition and data sorting using the previously performed preset.
It is worth noting that there have been endeavors to provide researchers with accessible means for the analysis of multi-omics data. First of all, OmicsNet2.0 (28) is a versatile bioinformatic web tool that provides comprehensive network analyses functionalities for multi-omic integration, promoting a holistic understanding of biological system. In addition, cBioPortal (29) is a widely used bioinformatic analysis platform aimed at easy and extensive exploration and visualization of complex cancer multi-omics data. Lastly, UCSC Xena (30) is a powerful and user-friendly online platform that enables correlative analyses of multiple large-scale genomic datasets with its intuitive interface.
Nonetheless, MOAST provides utilities that are currently unavailable in those tools (Supplementary Table 1). Although cBioPortal, UCSC Xena, and MOAST all provide sorting operation of omics data, only our tool provides hierarchical clustering options and further conveys sample orders based on the dendrogram to other datasets in sample-matched heatmaps to enable integrative and intuitive views of correlative structures between multi-omics data. As to survival analysis, MOAST takes an additional stride by allowing users to define custom score functions, which then can be assessed for its clinical significance in prognostication. OmicsNet2.0 provides prioritization of biomarkers based on network analyses. It does not accept data matrix over multiple patients. Therefore, MOAST users can bring their selected biomarkers to OmicsNet2.0 to further characterize them in the context of biological networks.
This Multi-Omic Analysis Sandbox Toolkit is largely divided into three conceptual parts: 1) creating an analysis set using clinical and genetic information, 2) performing omics analysis by various options in the desired dataset, and 3) sharing various results, which are implemented as 10 modules in the software.
The first module “A” allows users to freely navigate through data and easily designate groups based on clinical data. Sample groups can be stored and used in “PRESET” units for reusability. However, it is currently limited to two groups and limited to use only within the set data. Users can then decide whether to define gene sets based on differential analyses or their interests, giving them various ways to utilize the software. For instance, a gene set defined by previous studies of a user can be used to examine whether patterns are reproduced in an independent dataset registered in the software. In the “C1” module, users can perform differential analyses of single omics data and define gene sets by filtering according to P-values and mean differences as easily as simple mouse-clicking. On the other hand, with the “C2” module, users can iteratively examine gene sets of their interest, such as genes related to TGF-β pathways, to filter out genes that defy overall patterns in independent datasets and thus are highly likely to be false positives from previous analysis meant to derive those gene sets.
All those gene sets can be simultaneously visualized and compared in module “D”. This helps users to intuitively grasp correlative structures between different gene sets and omics data types. Based on this, users can select a single gene set of interest for multi-omics integrative analysis in module “E”, which leads to module “F” where correlation analysis results between single omics data are presented. Correlations between genetic/epigenetic alteration status and expression levels of matched genes suggest potential regulatory relationships between them and thus may shed light on mechanistic aspects underlying the disease of interest.
For the identification of putative biomarkers of cancer, it is crucial to associate them with clinical survival information and find optimal markers or combinations of markers to predict it. That is why we included a dedicated module “G” for clinical analyses, where built-in (risk scores or other summary statistics) or custom formulas could be utilized to derive omics-based scores. They could be statistically tested for clinical significance based on Kaplan-Meier analysis. Utilizing the iterative nature of our software, it is convenient to filter for core gene sets that are crucial to the clinical score. However, it is still limited in that only single omics at a time are supported for module “G”, which we plan to resolve in the near future. In the end, the primer designer module “H1” can be exploited to swiftly find potential PCR primers to assay discovered biomarkers. For ease of management, we also added a primer management module “H2”, where designed primers are saved with experimental records such as success/fail status, image of gel electrophoresis results, and notes on experiments.
Although a series of platforms have been developed to facilitate multi-omics analysis, including OmicsNet2.0, cBioPortal, and UCSC Xena, we believe that MOAST allows operations that have been unavailable to the best of our knowledge. These include hierarchical clustering of omics data, transfer cluster information to different multi-omics datasets, and survival analysis with user-defined formula. Given the lack of network analysis features in MOAST, it will be beneficial to further characterize selected biomarkers in OmicsNet2.0 to gain comprehensive view of biological systems in action.
Taken together, our software provides nearly a whole solution to swiftly find omics-based potential biomarkers and develop experimental PCR assays. Although it is still limited in various aspects, some of which are sacrificed in favor of user-friendliness, the software is still expected to have a significant impact on biomedical research and industry, given its efficiency of processes and ease of usage.
This multi-omics analysis requires uploading large files and high database processing. To this end, high-spec hardware was used in the Linux environment, and performance tunings of Mysql and Tomcat were also performed. In this system, docker, an operating system-based virtualization, was used to resolve version compatibility between various software, and docker-compose (container) configured for database, web application server, and FastAPI. For easy use by researchers, it was developed as a web service based on Spring Framework (31).
For multi-omics analysis, various bioinformatic tools were used for expression, methylation, and variation analysis. R-packages were installed in the web application server to secure interoperability. MSP-HTPrimer (23) for DNA methylation primer is only available on Windows system not on Linux in this work, so to solve the dependency problem, Oracle VirtualBox was utilized. CanvasXpress library (32) was used to visualize clinical and omics results.
The source code of MOAST is available from https://github.com/HyunjoongKim/moast, with exception to proprietary parts of the code.
This research was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Ministry of Science & ICT (grant number: NRF-2017M3A9A7050614 and NRF-2017M3A9A7050610). The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
The authors have no conflicting interests.
![]() |
![]() |