PTM Analysis
Anthony Wu
PTM-Analysis.Rmd
Installation
Run this code below to install MSstatsBioNet from bioconductor
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("MSstatsBioNet")
Dataset
We will be taking a subset of the dataset found in this paper.
The table is the output of the MSstatsPTM function
groupComparisonPTM
(filtered down to the columns that are
actually needed)
input = data.table::fread(system.file(
"extdata/garrido-2024.csv",
package = "MSstatsBioNet"
))
head(input)
#> Protein Label log2FC SE DF pvalue
#> <char> <char> <num> <num> <int> <num>
#> 1: P00533_S1039_S1042 t0 vs t1 -0.3200363 0.14994312 9 0.061580951
#> 2: P00533_S1064 t0 vs t1 0.3566531 0.08915347 9 0.003108364
#> 3: P00533_S991_S995 t0 vs t1 -0.1229037 0.10858298 9 0.286937655
#> 4: P00533_T693 t0 vs t1 -0.0233444 0.17724459 9 0.898113182
#> 5: P00533_T693_S695 t0 vs t1 -0.1659957 0.15000754 9 0.297173769
#> 6: P00533_Y1110 t0 vs t1 0.2106324 0.09279031 9 0.049364434
#> adj.pvalue issue
#> <num> <lgcl>
#> 1: 0.28024590 NA
#> 2: 0.06863598 NA
#> 3: 0.57907374 NA
#> 4: 0.96083634 NA
#> 5: 0.58809108 NA
#> 6: 0.25258914 NA
ID Conversion
First, we need to convert the group comparison results to a format
that can be processed by INDRA. We can use the
annotateProteinInfoFromIndra
function to obtain these
mappings.
In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We can also extract other information, such as hgnc gene name and protein function.
library(MSstatsBioNet)
#> Loading required package: MSstats
#>
#> Attaching package: 'MSstats'
#> The following object is masked from 'package:grDevices':
#>
#> savePlot
annotated_df = annotateProteinInfoFromIndra(input, "Uniprot")
head(annotated_df)
#> Protein Label log2FC SE DF pvalue
#> <char> <char> <num> <num> <int> <num>
#> 1: P00533_S1039_S1042 t0 vs t1 -0.3200363 0.14994312 9 0.061580951
#> 2: P00533_S1064 t0 vs t1 0.3566531 0.08915347 9 0.003108364
#> 3: P00533_S991_S995 t0 vs t1 -0.1229037 0.10858298 9 0.286937655
#> 4: P00533_T693 t0 vs t1 -0.0233444 0.17724459 9 0.898113182
#> 5: P00533_T693_S695 t0 vs t1 -0.1659957 0.15000754 9 0.297173769
#> 6: P00533_Y1110 t0 vs t1 0.2106324 0.09279031 9 0.049364434
#> adj.pvalue issue GlobalProtein UniprotId HgncId HgncName
#> <num> <lgcl> <char> <char> <char> <char>
#> 1: 0.28024590 NA P00533 P00533 3236 EGFR
#> 2: 0.06863598 NA P00533 P00533 3236 EGFR
#> 3: 0.57907374 NA P00533 P00533 3236 EGFR
#> 4: 0.96083634 NA P00533 P00533 3236 EGFR
#> 5: 0.58809108 NA P00533 P00533 3236 EGFR
#> 6: 0.25258914 NA P00533 P00533 3236 EGFR
#> IsTranscriptionFactor IsKinase IsPhosphatase
#> <lgcl> <lgcl> <lgcl>
#> 1: FALSE TRUE FALSE
#> 2: FALSE TRUE FALSE
#> 3: FALSE TRUE FALSE
#> 4: FALSE TRUE FALSE
#> 5: FALSE TRUE FALSE
#> 6: FALSE TRUE FALSE
Subnetwork Query
The package provides a function getSubnetworkFromIndra
that retrieves a subnetwork of proteins from the INDRA database based on
differential abundance analysis results. This function may help finding
off target subnetworks.
subnetwork <- getSubnetworkFromIndra(annotated_df, pvalueCutoff = 0.05, statement_types = c("Phosphorylation"), logfc_cutoff = 1, force_include_proteins = c("P00533_Y1110"))
#> Warning in getSubnetworkFromIndra(annotated_df, pvalueCutoff = 0.05, statement_types = c("Phosphorylation"), : NOTICE: This function includes third-party software components
#> that are licensed under the BSD 2-Clause License. Please ensure to
#> include the third-party licensing agreements if redistributing this
#> package or utilizing the results based on this package.
#> See the LICENSE file for more details.
head(subnetwork$nodes)
#> id logFC adj.pvalue hgncName Site
#> <char> <num> <num> <char> <char>
#> 1: P00533 0.2106324 0.2525891386 EGFR Y1110
#> 2: P28482 1.1379894 0.0138506355 MAPK1 Y187
#> 3: Q13480 1.4252333 0.0050364009 GAB1 S650_S651_Y659
#> 4: Q13480 1.3470807 0.0447363036 GAB1 S651_Y659
#> 5: Q13480 2.5847134 0.0032951498 GAB1 Y627_T638
#> 6: Q13480 2.9336603 0.0003464872 GAB1 Y659
head(subnetwork$edges)
#> source target interaction evidenceCount paperCount
#> 1 Q13480 P00533 Phosphorylation 1 1
#> 2 P00533 P28482 Phosphorylation 7 3
#> 3 P00533 Q13480 Phosphorylation 1 1
#> 4 P00533 Q13480 Phosphorylation 26 4
#> 5 P00533 Q13480 Phosphorylation 14 2
#> 6 P28482 Q13480 Phosphorylation 3 1
#> evidenceLink
#> 1 https://db.indra.bio/statements/from_agents?subject=4066@HGNC&object=3236@HGNC&format=html
#> 2 https://db.indra.bio/statements/from_agents?subject=3236@HGNC&object=6871@HGNC&format=html
#> 3 https://db.indra.bio/statements/from_agents?subject=3236@HGNC&object=4066@HGNC&format=html
#> 4 https://db.indra.bio/statements/from_agents?subject=3236@HGNC&object=4066@HGNC&format=html
#> 5 https://db.indra.bio/statements/from_agents?subject=3236@HGNC&object=4066@HGNC&format=html
#> 6 https://db.indra.bio/statements/from_agents?subject=6871@HGNC&object=4066@HGNC&format=html
#> sourceCounts site
#> 1 {"sparser": 1} <NA>
#> 2 {"reach": 1} <NA>
#> 3 {"rlimsp": 1} Y317
#> 4 {"reach": 1} <NA>
#> 5 {"bel": 2} Y659
#> 6 {"hprd": 2, "signor": 1} S454
Network Visualization
Visualize the subnetwork on your browser
previewNetworkInBrowser(subnetwork$nodes, subnetwork$edges, displayLabelType = "hgncName")
#> Network visualization exported to: /tmp/RtmpzomPVv/file1f20277e3fcc.html
Session info
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MSstatsBioNet_1.1.7 MSstats_4.16.1 BiocStyle_2.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2 IRdisplay_1.1
#> [4] dplyr_1.1.4 farver_2.1.2 bitops_1.0-9
#> [7] fastmap_1.2.0 lazyeval_0.2.2 RCurl_1.98-1.17
#> [10] base64url_1.4 XML_3.99-0.19 digest_0.6.37
#> [13] lifecycle_1.0.4 survival_3.8-3 statmod_1.5.0
#> [16] r2r_0.1.2 magrittr_2.0.3 compiler_4.5.1
#> [19] rlang_1.1.6 sass_0.4.10 tools_4.5.1
#> [22] yaml_2.3.10 data.table_1.17.8 knitr_1.50
#> [25] htmlwidgets_1.6.4 curl_7.0.0 MSstatsConvert_1.18.1
#> [28] marray_1.86.0 repr_1.1.7 RColorBrewer_1.1-3
#> [31] KernSmooth_2.23-26 pbdZMQ_0.3-14 purrr_1.1.0
#> [34] BiocGenerics_0.54.0 desc_1.4.3 stats4_4.5.1
#> [37] grid_4.5.1 preprocessCore_1.70.0 caTools_1.18.3
#> [40] log4r_0.4.4 ggplot2_3.5.2 scales_1.4.0
#> [43] gtools_3.9.5 MASS_7.3-65 cli_3.6.5
#> [46] rmarkdown_2.29 crayon_1.5.3 ragg_1.5.0
#> [49] reformulas_0.4.1 generics_0.1.4 httr_1.4.7
#> [52] minqa_1.2.8 cachem_1.1.0 splines_4.5.1
#> [55] parallel_4.5.1 BiocManager_1.30.26 base64enc_0.1-3
#> [58] vctrs_0.6.5 boot_1.3-31 Matrix_1.7-3
#> [61] jsonlite_2.0.0 bookdown_0.44 ggrepel_0.9.6
#> [64] systemfonts_1.2.3 limma_3.64.3 plotly_4.11.0
#> [67] tidyr_1.3.1 jquerylib_0.1.4 glue_1.8.0
#> [70] nloptr_2.2.1 pkgdown_2.1.3 RJSONIO_2.0.0
#> [73] stringi_1.8.7 gtable_0.3.6 lme4_1.1-37
#> [76] tibble_3.3.0 pillar_1.11.0 htmltools_0.5.8.1
#> [79] gplots_3.2.0 RCy3_2.28.1 graph_1.86.0
#> [82] IRkernel_1.3.2 R6_2.6.1 textshaping_1.0.3
#> [85] Rdpack_2.6.4 evaluate_1.0.5 lattice_0.22-7
#> [88] rbibutils_2.3 backports_1.5.0 bslib_0.9.0
#> [91] Rcpp_1.1.0 uuid_1.2-1 nlme_3.1-168
#> [94] checkmate_2.3.3 xfun_0.53 fs_1.6.6
#> [97] pkgconfig_2.0.3