MSstatsBioNet Introduction
Anthony Wu
MSstatsBioNet.Rmd
Installation
Run this code below to install MSstatsBioNet from bioconductor
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("MSstatsBioNet")
Purpose of MSstatsBioNet
The MSstatsBioNet
package is a member of the
MSstats
family of packages. It contains a set of functions
for interpretation of mass spectrometry (MS) statistical analysis
results in the context of protein-protein interaction networks. The
package is designed to be used in conjunction with the
MSstats
package.
MSstats Convert from Upstream Dataset
library(MSstatsConvert)
input <- system.file("tinytest/raw_data/Metamorpheus/AllQuantifiedPeaks.tsv",
package = "MSstatsConvert"
)
input <- data.table::fread(input)
annot <- system.file("tinytest/raw_data/Metamorpheus/Annotation.tsv",
package = "MSstatsConvert"
)
annot <- data.table::fread(annot)
msstats_imported <- MetamorpheusToMSstatsFormat(input,
annotation = annot,
use_log_file = FALSE
)
#> INFO [2024-11-06 17:09:17] ** Raw data from Metamorpheus imported successfully.
#> INFO [2024-11-06 17:09:17] ** Raw data from Metamorpheus cleaned successfully.
#> INFO [2024-11-06 17:09:17] ** Using provided annotation.
#> INFO [2024-11-06 17:09:17] ** Run labels were standardized to remove symbols such as '.' or '%'.
#> INFO [2024-11-06 17:09:17] ** The following options are used:
#> - Features will be defined by the columns: PeptideSequence, PrecursorCharge
#> - Shared peptides will be removed.
#> - Proteins with single feature will not be removed.
#> - Features with less than 3 measurements across runs will be removed.
#> INFO [2024-11-06 17:09:17] ** Features with all missing measurements across runs are removed.
#> INFO [2024-11-06 17:09:17] ** Shared peptides are removed.
#> INFO [2024-11-06 17:09:17] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
#> INFO [2024-11-06 17:09:17] ** Features with one or two measurements across runs are removed.
#> INFO [2024-11-06 17:09:17] ** Run annotation merged with quantification data.
#> INFO [2024-11-06 17:09:17] ** Features with one or two measurements across runs are removed.
#> INFO [2024-11-06 17:09:17] ** Fractionation handled.
#> INFO [2024-11-06 17:09:17] ** Updated quantification data to make balanced design. Missing values are marked by NA
#> INFO [2024-11-06 17:09:17] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
head(msstats_imported)
#> ProteinName PeptideSequence
#> 1 O43707 AC[Common Fixed:Carbamidomethyl on C]LISLGYDVENDRQGEAEFNR
#> 2 O43707 AC[Common Fixed:Carbamidomethyl on C]LISLGYDVENDRQGEAEFNR
#> 3 O43707 AC[Common Fixed:Carbamidomethyl on C]LISLGYDVENDRQGEAEFNR
#> 4 O43707 AC[Common Fixed:Carbamidomethyl on C]LISLGYDVENDRQGEAEFNR
#> 5 B9A064 AGVETTKPSK
#> 6 B9A064 AGVETTKPSK
#> PrecursorCharge FragmentIon ProductCharge IsotopeLabelType Condition
#> 1 3 NA NA L neg
#> 2 3 NA NA L neg
#> 3 3 NA NA L pos
#> 4 3 NA NA L pos
#> 5 2 NA NA L neg
#> 6 2 NA NA L neg
#> BioReplicate Run Fraction Intensity
#> 1 1 NEG1-calib 1 NA
#> 2 2 NEG2-calib 1 655481.2
#> 3 3 POS1-calib 1 496891.9
#> 4 4 POS2-calib 1 213691.0
#> 5 1 NEG1-calib 1 4201663.3
#> 6 2 NEG2-calib 1 4427577.4
We will first convert the input data to a format that can be
processed by MSstats. In this example, the
MetamorpheusToMSstatsFormat
function is used to convert the
input data from Metamorpheus to MSstats format. The MSstats format will
contain the necessary information for downstream analysis, such as
peptide information, abundance values, run ID, and experimental
annotation information.
MSstats Process and GroupComparison
library(MSstats)
#>
#> Attaching package: 'MSstats'
#> The following objects are masked from 'package:MSstatsConvert':
#>
#> DIANNtoMSstatsFormat, DIAUmpiretoMSstatsFormat,
#> FragPipetoMSstatsFormat, MaxQtoMSstatsFormat,
#> OpenMStoMSstatsFormat, OpenSWATHtoMSstatsFormat, PDtoMSstatsFormat,
#> ProgenesistoMSstatsFormat, SkylinetoMSstatsFormat,
#> SpectronauttoMSstatsFormat
#> The following object is masked from 'package:grDevices':
#>
#> savePlot
QuantData <- dataProcess(msstats_imported, use_log_file = FALSE)
#> INFO [2024-11-06 17:09:19] ** Log2 intensities under cutoff = 16.018 were considered as censored missing values.
#> INFO [2024-11-06 17:09:19] ** Log2 intensities = NA were considered as censored missing values.
#> INFO [2024-11-06 17:09:19] ** Use all features that the dataset originally has.
#> INFO [2024-11-06 17:09:19]
#> # proteins: 7
#> # peptides per protein: 1-26
#> # features per peptide: 1-1
#> INFO [2024-11-06 17:09:19] Some proteins have only one feature:
#> O00391 ...
#> INFO [2024-11-06 17:09:19]
#> neg pos
#> # runs 2 2
#> # bioreplicates 2 2
#> # tech. replicates 1 1
#> INFO [2024-11-06 17:09:19] == Start the summarization per subplot...
#> | | | 0% | |========== | 14% | |==================== | 29% | |============================== | 43% | |======================================== | 57% | |================================================== | 71% | |============================================================ | 86% | |======================================================================| 100%
#> INFO [2024-11-06 17:09:19] == Summarization is done.
groupComparisonResult <- groupComparison(
contrast.matrix = "pairwise",
data = QuantData,
use_log_file = FALSE
)
#> INFO [2024-11-06 17:09:19] == Start to test and get inference in whole plot ...
#> | | | 0% | |========== | 14% | |==================== | 29% | |============================== | 43% | |======================================== | 57% | |================================================== | 71% | |============================================================ | 86% | |======================================================================| 100%
#> INFO [2024-11-06 17:09:19] == Comparisons for all proteins are done.
Next, we will preprocess the data using the dataProcess
function and perform a statistical analysis using the
groupComparison
function. The output of
groupComparison
is a table containing a list of
differentially abundant proteins with their associated p-values and log
fold changes.
MSstatsBioNet Analysis
ID Conversion
First, we need to convert the group comparison results to a format
that can be processed by INDRA. The getSubnetworkFromIndra
function requires a table containing HGNC IDs. We can use the following
mapping to convert from Uniprot to HGNC ID. This mapping was obtained
from this Python
uniprot client
In the future, this uniprot client will be added as a separate function to allow users to perform this conversion in R in a streamlined way.
library(MSstatsBioNet)
uniprot_to_hgnc_mapping <- c(
"B9A064" = "38476",
"O00391" = "9756",
"O14818" = "9536",
"O43598" = "21218",
"O43707" = "166",
"P16050" = "11390",
"P84243" = "4765"
)
groupComparisonResult$ComparisonResult$HgncId <- uniprot_to_hgnc_mapping[
groupComparisonResult$ComparisonResult$Protein
]
Subnetwork Query
The package provides a function getSubnetworkFromIndra
that retrieves a subnetwork of proteins from the INDRA database based on
differential abundance analysis results.
subnetwork <- getSubnetworkFromIndra(groupComparisonResult$ComparisonResult)
#> Warning in getSubnetworkFromIndra(groupComparisonResult$ComparisonResult): NOTICE: This function includes third-party software components
#> that are licensed under the BSD 2-Clause License. Please ensure to
#> include the third-party licensing agreements if redistributing this
#> package or utilizing the results based on this package.
#> See the LICENSE file for more details.
This package is distributed under the Artistic-2.0 license. However, its dependencies may have different licenses. In this example, getSubnetworkFromIndra depends on INDRA, which is distributed under the BSD 2-Clause license. Furthermore, INDRA’s knowledge sources may have different licenses for commercial applications. Please refer to the INDRA README for more information on its knowledge sources and their associated licenses.
Visualize Networks
The function visualizeNetworks
then takes the output of
getSubnetworkFromIndra
and visualizes the subnetwork. The
function requires Cytoscape desktop to be open for the visualization to
work.
visualizeNetworks(subnetwork$nodes, subnetwork$edges)
#> Warning in visualizeNetworks(subnetwork$nodes, subnetwork$edges): Visualization
#> is not available in non-interactive mode.
In the network diagram displayed in Cytoscape, you should see two arrows connecting two nodes, P16050 and P84243. These arrows represent the interactions between these two proteins, notably activation and phosphorylation.
Session info
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MSstatsBioNet_0.99.6 MSstats_4.14.0 MSstatsConvert_1.16.0
#> [4] BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2 IRdisplay_1.1
#> [4] dplyr_1.1.4 bitops_1.0-9 RCurl_1.98-1.16
#> [7] fastmap_1.2.0 lazyeval_0.2.2 base64url_1.4
#> [10] XML_3.99-0.17 digest_0.6.37 lifecycle_1.0.4
#> [13] survival_3.7-0 statmod_1.5.0 r2r_0.1.1
#> [16] magrittr_2.0.3 compiler_4.4.2 rlang_1.1.4
#> [19] sass_0.4.9 tools_4.4.2 utf8_1.2.4
#> [22] yaml_2.3.10 data.table_1.16.2 knitr_1.48
#> [25] htmlwidgets_1.6.4 curl_5.2.3 marray_1.84.0
#> [28] RColorBrewer_1.1-3 repr_1.1.7 KernSmooth_2.23-24
#> [31] pbdZMQ_0.3-13 purrr_1.0.2 BiocGenerics_0.52.0
#> [34] desc_1.4.3 stats4_4.4.2 grid_4.4.2
#> [37] preprocessCore_1.68.0 fansi_1.0.6 caTools_1.18.3
#> [40] colorspace_2.1-1 log4r_0.4.4 ggplot2_3.5.1
#> [43] scales_1.3.0 gtools_3.9.5 MASS_7.3-61
#> [46] cli_3.6.3 rmarkdown_2.29 crayon_1.5.3
#> [49] ragg_1.3.3 generics_0.1.3 httr_1.4.7
#> [52] minqa_1.2.8 cachem_1.1.0 splines_4.4.2
#> [55] parallel_4.4.2 BiocManager_1.30.25 base64enc_0.1-3
#> [58] vctrs_0.6.5 boot_1.3-31 Matrix_1.7-1
#> [61] jsonlite_1.8.9 bookdown_0.41 ggrepel_0.9.6
#> [64] systemfonts_1.1.0 limma_3.62.1 plotly_4.10.4
#> [67] jquerylib_0.1.4 tidyr_1.3.1 glue_1.8.0
#> [70] nloptr_2.1.1 pkgdown_2.1.1 RJSONIO_1.3-1.9
#> [73] stringi_1.8.4 gtable_0.3.6 lme4_1.1-35.5
#> [76] munsell_0.5.1 tibble_3.2.1 pillar_1.9.0
#> [79] htmltools_0.5.8.1 gplots_3.2.0 RCy3_2.26.0
#> [82] graph_1.84.0 IRkernel_1.3.2 R6_2.5.1
#> [85] textshaping_0.4.0 evaluate_1.0.1 lattice_0.22-6
#> [88] backports_1.5.0 bslib_0.8.0 Rcpp_1.0.13-1
#> [91] uuid_1.2-1 nlme_3.1-166 checkmate_2.3.2
#> [94] xfun_0.49 fs_1.6.5 pkgconfig_2.0.3