Integration ================================ Necessary inputs ---------------- The pipeline can be used to integrate simultaneously different types of omics data. The method we currently provide is Multiple Co-Inertia Analysis (MCIA). You can either run MCIA with your own data or choose this module as an additional step in your analyses. Check `params_mcia.yml `__. You need to provide the path in (pathmcia variable) in which you store the different omics data and a samples info file describing the metadata of each sample. The path you provide should consist of sub-directories named after the omics type you have. For example, if I have proteins and miRNAs my MCIA directory will look like: .. code:: plaintext - mcia -genes - genes.txt - proteins - proteins.txt - lipids - lipids.txt Alternatively, if MCIA is used with pipeline-generated data you should declare these variables as follows: .. code:: bash params{ outdir: '/home/results/' ### If outdir = results then pathmcia should: /complete/path/to/results/mcia/ biotransl_all_path:path/to/results/prepareforbio pathmcia = '/home/results/mcia/' # should of the string of the format /complete/path/to/results/mcia/ biotrans_all_path : "/home/results/prepareforbio/" # should of the string of the format /complete/path/to/results/prepareforbio/ } Additionally, you need to change parameters in `params_mcia.yml <../params_mcia.yml>`__ appropriately: .. code:: bash params{ runmcia = true samplesinfomcia = 'path of you samples info file' a1lim = limits of your X space, Default : '0, Inf' a2lim = limits of your Y space, Default : '0, Inf' } .. Change the location of the files appropriately Important ~~~~~~~~~ .. rubric:: Sample names have to be in the first column or in a column called sampleID and **need to match** the column names of your count matrix. :name: sample-names-have-to-be-in-the-first-column-or-in-a-column-called-sampleid-and-need-to-match-the-column-names-of-your-count-matrix. .. rubric:: If you have column names other than **condition** you need to declare the names in the params_mcia.yml. :name: if-you-have-column-names-other-than-condition-you-need-to-declare-the-names-in-the-params_mcia.yml. .. rubric:: Sample names have to be identical between different omics types! Running the pipeline -------------------- The general command to run the pipeline is: .. code:: bash nextflow run multiomicsintegrator -params-file multiomicsintegrator/params_mcia.yml -profile docker This will launch the pipeline with the ``docker`` configuration profile. See below for more information about profiles. Note that the pipeline will create the following files in your working directory: .. code:: bash work 'Directory containing the nextflow working files' ' Location of where you want your results (defined by outdir)' .nextflow_log # Log file from Nextflow # Other nextflow hidden files, eg. history of pipeline runs and old logs. Functionality ------------- MCIA utilizes techniques such as covariance analysis, optimization, and dimensionality reduction to integrate diverse omics datasets. It starts by centering and scaling data, then identifies shared patterns through joint covariance analysis. Through optimization, it determines coefficients for each feature, quantifying their contribution to the shared structures. The method constructs latent variables (or components), representing these shared patterns. If needed, dimensionality reduction is applied for a more concise interpretation. This comprehensive approach allows MCIA to effectively capture commonalities and distinctions in multi-omics data, offering insights into complex biological relationships. Additionally, we offer the possibility of functional integration of data to cover scenarios in which MCIA cannot be applied. Namely, we offer LipiDB ~~~~~~~~~~~~~~~~~~~ LipidR will produce differentially expressed features for each category of lipids. Subsequently, LipiDB, using KREGGREST and a local daabase, will find genes associated to these differentially expressed lipids, for each category. Input is the result of lipidR or in other words a txt file that has deregulated lipids along with their logFC and pval (Columns should be in that order). The results are in as form of a text file and a heatmap. If the user wants to run LipiDB alone they need to declare it in nextflow.confi: .. code:: bash params{ lipidb_alone = true } and this is the command to run it alone: .. code:: bash nextflow run multiomicsintegrator/modules/local/annotate_lipids/main.nf -c multiomicsintegrator/nextflow.config -profile docker multiMiR ~~~~~~~~~ MultiMiR is a database that stores predicted and experimentaly targets of miRNA. As input it takes a txt file containing differentially expressed miRNAs, in a single column. The output consists of two files, one containing only the targets and one storing the miRNA with their targets. If the user wants to run multiMiR alone they need to declare it in nextflow.config: .. code:: bash params{ multimir_alone = true } and this is the command to run it alone: .. code:: bash nextflow run multiomicsintegrator/modules/local/multimir/main.nf -c multiomicsintegrator/nextflow.config -profile docker Exploratory analysis ~~~~~~~~~~~~~~~~~~~~ The pipeline produces automatically a heatmap with differentially expressed features and their presence across available omics layers. As input it takes differentially expressed features and, optionally, the results from multiMiR and LipiDB. If the user wants to run multiMiR alone they need to declare it in nextflow.config: .. code:: bash params{ preparedf_alone = true preparedf_alone_genes = '[Logical, do you have genes?]' preparedf_alone_mirna = '[Logical, do you have miRNA?]' preparedf_alone_proteins = '[Logical, do you have proteins?]' preparedf_alone_lipids = '[Logical, do you have lipids?]' preparedf_alone_isoforms = '[Logical, do you have isoforms?]' preparedf_alone_integrated = '[Logical, have you applied mcia?]' preparedf_alone_integratedafterlipids = '[Logical, have you applied mcia and lipidomic analysis?]' preparedf_alone_path = '[Directory of the inputs]' preparedf_alone_alg_genes = '[Algorithm used for genes]' preparedf_alone_alg_mirna = '[Algorithm used for miRNA]' preparedf_alone_alg_proteins = '[Algorithm used for proteins]' preparedf_alone_pval = '[pvalue cut off]' } Extremely important: The files should follow the same naming system as the output of MOI, for example for genes : genes_defeatures.txt!!!! and this is the command to run it alone: .. code:: bash nextflow run multiomicsintegrator/modules/local/prepare_for_bio_alone/main.nf -c multiomicsintegrator/nextflow.config -profile docker Correlation analysis ~~~~~~~~~~~~~~~~~~~~~ To estimate correlation between differentially expressed features we suggest to use the count matrices of the differentially expressed features. .. code:: bash params{ correlation_alone = false cor_m1 = "${projectDir}/results//mirna/rankprod/mirna_defeatures_expression.txt" cor_m2 = "${projectDir}/results/genes/rankprod/genes_defeatures_expression.txt" cor_method = "pearson" // method of correlation. available: pearson, spearman cor_corc = 0.8 // cutoff of correlation cor_pvalc = 0.1 // pval cutoff of correlation } Additionally, we offer a **comparative_analysis** tool, which estimates the semantic distance (e.g., the similarity of their pathways) of two features signatures. Input is a txt file, with each column storing one distinct feature signature. Available parameters are: .. code:: bash params{ comparative_alone = [logcal, if you want to run it as a standalone module, default : false] biocomp_input = ['Input'] biocomp_organism = "hsapiens" // Organism biocomp_keytype = "gene_symbol" // Type of keys. Available gene_symbol, ensembl, ncbi biocomp_ontology = "GO" // Ontologies MGIMP, Reactome } If the user wishes to run correlation or comparative_analysis as standalone modules they need to modify the nextflow.config file and run the command: .. code:: bash nextflow run multiomicsintegrator/modules/local/correlation/main.nf -c multiomicsintegrator/nextflow.config -profile docker or .. code:: bash nextflow run multiomicsintegrator/modules/local/comparative_analysis/main.nf -c multiomicsintegrator/nextflow.config -profile docker OmnipathR ~~~~~~~~~~~~ OmnipathR is a knowledge database that stores multiple levels of biological information. In MOI OmnipathR can run as part of the pipeline or as a standalone tool. As part of the pipeline it takes the hub genes and forms a network of protein protein interactions. Moreover, it can annotate the hub features based on the role of the feature (e.g., ligand, transcription factor etc.) in the signaling pathway they reside in. By leveraging this information it can then reconstruct the pathways that exist in the network, an aspect crucial in signaling specific contexts. Detailed information on how to run the tool is listed below: .. code:: bash params { omnipath_biotrans = '[directory containing the outputs of biotranslator, relative to outdir]' omnipath_choose = '[choose_omics, choose_role]' omnipath_choose_type = '[logical, specify if additional annotation is desired]' omnipath_additional_info_bool = '[Logical, whether you want additional annotation]' omnipath_additional_info_val = '[Must be present in get_omnipath_resources(), e.g., "SignaLink pathway"]' omnipath_additional_info_attribute = '[Must be in get_omnipath_resources(omnipath_annot), e.g., "TGF" (omnipath_annot is declared above)]' } If the user want to run the tool as a standalone module for a single omics they need one extra argument: .. code:: bash params{ omnipath_alone = '[logical, T]' } The command to run the tool as a standalone module is .. code:: bash nextflow run multiomicsintegrator/modules/local/omnipath/main.nf -c multiomicsintegrator/nextflow.config -profile docker Moreover, if the user has multiple omics and wants to integrate them after the step of differential expression rather than after pathway enrichment analysis they need to supply an additional file with columns Gene (gene symbol) and Category (omics type). This file is automatically produced by MOI and is called genes_across_omics.txt .. code:: bash params{ omnipath_biotrans = '[ directory that has the outputs of biotranslator, should be relative to outdir]' omnipath_integrated_gao = '[ path of file genes_across_omics ]' omnipath_choose = '[choose_omics, choose_role]' omnipath_choose_type = '[logical, do you want additional annotation]' omnipath_additional_info_bool = '[Logical, whether you want additional annotation]' omnipath_additional_info_val = '[Must be present in get_omnipath_resources(), for example "SignaLink pathway"]' omnipath_additional_info_attribute = '[Must be a get_omnipath_resources(omnipath_annot), for example "TGF" (omnipath_annot is declared above)]' } If the user wants to run the tool as a standalone module for a single omics they need one extra argument: .. code:: bash params{ omnipath_integrated_alone = '[logical, T]' } The command to run it as a standalone module is: .. code:: bash nextflow run multiomicsintegrator/modules/local/omnipath_integrated/main.nf -c multiomicsintegrator/nextflow.config -profile docker Additional omics types ~~~~~~~~~~~~~~~~~~~~~~~~~ MOI can be extended to other omics types as well. If supplied with abundance matrices (for example, glycomics) MOI can integrate them with MCIA, after performing basic filtering and normalization steps. If translated into the gene level, MOI can integrate them with the exploratory analysis tool, multiMiR, lipidDB as explained above. In addition, if translated to the gene level additional omics types can be integrated with high-level approaches like biotranslator, comparative analysis tool or OmnipathR. The user will treat these data as if they were gene data.