Convert Spectronaut output into MSstatsPTM format
Source:R/converters.R
SpectronauttoMSstatsPTMFormat.RdConverters label-free Spectronaut data into MSstatsPTM format. Requires PSM output from Spectronaut and a custom made annotation file, mapping the run name to the condition and bioreplicate. Can optionally take a seperate PSM file for a global profiling run. If no global profiling run provided, the function can extract the unmodified peptides from the PTM PSM file and use them as a global profiling run (not recommended).
Usage
SpectronauttoMSstatsPTMFormat(
input,
annotation = NULL,
fasta_path = NULL,
protein_input = NULL,
annotation_protein = NULL,
use_unmod_peptides = FALSE,
intensity = "PeakArea",
mod_id = "\\[Phospho \\(STY\\)\\]",
fasta_protein_name = "uniprot_iso",
remove_other_mods = TRUE,
filter_with_Qvalue = TRUE,
qvalue_cutoff = 0.01,
useUniquePeptide = TRUE,
removeFewMeasurements = TRUE,
removeProtein_with1Feature = FALSE,
summaryforMultipleRows = max,
use_log_file = TRUE,
append = FALSE,
verbose = TRUE,
log_file_path = NULL
)Arguments
- input
name of Spectronaut PTM output, which is long-format. ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity, F.ExcludedFromQuantification are required. Rows with F.ExcludedFromQuantification=True will be removed.
- annotation
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Spectronaut, use annotation=NULL (default). It will use the annotation information from input.
- fasta_path
string containing path to the corresponding fasta file for the modified peptide dataset.
- protein_input
name of Spectronaut global protein output, which is as in the same format as
inputparameter.- annotation_protein
name of annotation file for global protein data, in the same format as above.
- use_unmod_peptides
If
protein_inputis not provided, unmodified peptides can be extracted frominputto be used in place of a global profiling run. Default isFALSE.- intensity
'PeakArea'(default) uses not normalized peak area. 'NormalizedPeakArea' uses peak area normalized by Spectronaut. Default is NULL
- mod_id
Character that indicates the modification of interest. Default is
\\(Phospho\\). Note\\must be included before special characters.- fasta_protein_name
Name of fasta column that matches with protein name in evidence file. Default is
uniprot_iso.- remove_other_mods
Remove peptides which include modfications other than the one listed in
mod_id. Default isTRUE. For example, in an experiment targeting Phosphorylation, setting this parameter toTRUEwould remove peptides like (Acetyl (Protein N-term))AAAAPDSRVS(Phospho (STY))EEENLK. Set this parameter toFALSEto keep peptides with extraneous modifications.- filter_with_Qvalue
TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in EG.Qvalue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose.
- qvalue_cutoff
Cutoff for EG.Qvalue. Default is 0.01.
- useUniquePeptide
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.
- removeFewMeasurements
TRUE (default) will remove the features that have 1 or 2 measurements across runs.
- removeProtein_with1Feature
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.
- summaryforMultipleRows
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities.
- use_log_file
logical. If TRUE, information about data processing will be saved to a file.
- append
logical. If TRUE, information about data processing will be added to an existing log file.
- verbose
logical. If TRUE, information about data processing wil be printed to the console.
- log_file_path
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file.
Examples
head(spectronaut_input)
#> R.Condition R.FileName R.Replicate PG.Genes
#> 1 Honly 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 DNM1L
#> 2 Honly 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 BIN1
#> 3 Honly 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 BIN1
#> 4 Honly 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 KMT2D
#> 5 Honly 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 PPP1R12A
#> 6 Honly 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 SEC16A
#> PG.ProteinDescriptions PG.ProteinGroups PG.ProteinNames
#> 1 Dynamin-1-like protein O00429 DNM1L_HUMAN
#> 2 Myc box-dependent-interacting protein 1 O00499 BIN1_HUMAN
#> 3 Myc box-dependent-interacting protein 1 O00499 BIN1_HUMAN
#> 4 Histone-lysine N-methyltransferase 2D O14686 KMT2D_HUMAN
#> 5 Protein phosphatase 1 regulatory subunit 12A O14974 MYPT1_HUMAN
#> 6 Protein transport protein Sec16A O15027 SC16A_HUMAN
#> PEP.PeptidePosition EG.IsDecoy
#> 1 607 FALSE
#> 2 293 FALSE
#> 3 293 FALSE
#> 4 4736 FALSE
#> 5 443 FALSE
#> 6 879 FALSE
#> EG.PrecursorId
#> 1 _SKPIPIM[Oxidation (M)]PAS[Phospho (STY)]PQK_.2
#> 2 _GNKSPS[Phospho (STY)]PPDGSPAATPEIR_.3
#> 3 _GNKS[Phospho (STY)]PSPPDGSPAATPEIR_.3
#> 4 _ALS[Phospho (STY)]PVIPLIPR_.2
#> 5 _TGS[Phospho (STY)]YGALAEITASK_.2
#> 6 _AQQELVPPQQQ[Deamidation (NQ)]AS[Phospho (STY)]PPQLPK_.3
#> EG.PTMAssayCandidateScore EG.PTMAssayProbability EG.PTMLocalizationConfidence
#> 1 29.064455 0.9999999 0.9999999
#> 2 6.009665 0.4966855 0.4966855
#> 3 6.009665 0.4966855 0.4966855
#> 4 NaN NaN 1.0000000
#> 5 24.307762 0.5848936 0.5848936
#> 6 15.286304 0.3310838 0.6655419
#> EG.PTMLocalizationProbabilities
#> 1 _S[Phospho (STY): 0%]KPIPIM[Oxidation (M): 100%]PAS[Phospho (STY): 100%]PQK_
#> 2 _GNKS[Phospho (STY): 49.7%]PS[Phospho (STY): 49.7%]PPDGS[Phospho (STY): 0.3%]PAAT[Phospho (STY): 0.3%]PEIR_
#> 3 _GNKS[Phospho (STY): 49.7%]PS[Phospho (STY): 49.7%]PPDGS[Phospho (STY): 0.3%]PAAT[Phospho (STY): 0.3%]PEIR_
#> 4 _ALS[Phospho (STY): 100%]PVIPLIPR_
#> 5 _T[Phospho (STY): 41.5%]GS[Phospho (STY): 58.5%]Y[Phospho (STY): 0%]GALAEIT[Phospho (STY): 0%]AS[Phospho (STY): 0%]K_
#> 6 _AQ[Deamidation (NQ): 0%]Q[Deamidation (NQ): 0%]ELVPPQ[Deamidation (NQ): 33.1%]Q[Deamidation (NQ): 33.1%]Q[Deamidation (NQ): 33.1%]AS[Phospho (STY): 100%]PPQ[Deamidation (NQ): 0.7%]LPK_
#> EG.NormalizationFactor EG.TotalQuantity..Settings. FG.Charge F.Charge
#> 1 1558039.8 24796966912 2 2
#> 2 1766846.4 9006234624 3 3
#> 3 1766846.4 9006234624 3 3
#> 4 887979.7 26086424576 2 2
#> 5 1128734.0 98855960576 2 2
#> 6 1189332.0 356193000000 3 3
#> EG.ModifiedSequence F.FrgIon F.FrgLossType
#> 1 _SKPIPIM[Oxidation (M)]PAS[Phospho (STY)]PQK_ NA noloss
#> 2 _GNKSPS[Phospho (STY)]PPDGSPAATPEIR_ NA noloss
#> 3 _GNKS[Phospho (STY)]PSPPDGSPAATPEIR_ NA noloss
#> 4 _ALS[Phospho (STY)]PVIPLIPR_ NA noloss
#> 5 _TGS[Phospho (STY)]YGALAEITASK_ NA noloss
#> 6 _AQQELVPPQQQ[Deamidation (NQ)]AS[Phospho (STY)]PPQLPK_ NA noloss
#> F.ExcludedFromQuantification F.PeakArea
#> 1 FALSE 24796966912
#> 2 FALSE 9006234624
#> 3 FALSE 9006234624
#> 4 FALSE 26086424576
#> 5 FALSE 98855960576
#> 6 FALSE 356193000000
head(spectronaut_annotation)
#> Run Fraction TechRepMixture Condition
#> 1 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 1 H100_Y0
#> 2 20180815_QE3_nLC3_AH_DIA_Honly_ind_02 1 1 H100_Y0
#> 3 20180815_QE3_nLC3_AH_DIA_Honly_ind_03 1 1 H100_Y0
#> 4 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01 1 1 H0_Y100
#> 5 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02 1 1 H0_Y100
#> 6 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03 1 1 H0_Y100
#> BioReplicate
#> 1 H100_Y0_04
#> 2 H100_Y0_05
#> 3 H100_Y0_06
#> 4 H0_Y100_01
#> 5 H0_Y100_02
#> 6 H0_Y100_03
msstats_input = SpectronauttoMSstatsPTMFormat(spectronaut_input,
annotation=spectronaut_annotation,
fasta_path=system.file("extdata", "spectronaut_fasta.fasta", package="MSstatsPTM"),
use_unmod_peptides=TRUE,
mod_id = "\\[Phospho \\(STY\\)\\]",
fasta_protein_name = "uniprot_iso"
)
#> INFO [2026-04-09 15:19:22] ** Raw data from Spectronaut imported successfully.
#> INFO [2026-04-09 15:19:22] ** Raw data from Spectronaut cleaned successfully.
#> INFO [2026-04-09 15:19:22] ** Using provided annotation.
#> INFO [2026-04-09 15:19:22] ** Run labels were standardized to remove symbols such as '.' or '%'.
#> INFO [2026-04-09 15:19:22] ** The following options are used:
#> - Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge
#> - Shared peptides will be removed.
#> - Proteins with single feature will not be removed.
#> - Features with less than 3 measurements across runs will be removed.
#> INFO [2026-04-09 15:19:22] ** Intensities with values of FExcludedFromQuantification equal to TRUE are replaced with NA
#> WARN [2026-04-09 15:19:22] ** PGQvalue not found in input columns.
#> WARN [2026-04-09 15:19:22] ** EGQvalue not found in input columns.
#> INFO [2026-04-09 15:19:22] ** Features with all missing measurements across runs are removed.
#> INFO [2026-04-09 15:19:22] ** Shared peptides are removed.
#> INFO [2026-04-09 15:19:22] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
#> INFO [2026-04-09 15:19:22] ** Features with one or two measurements across runs are removed.
#> INFO [2026-04-09 15:19:22] ** Run annotation merged with quantification data.
#> INFO [2026-04-09 15:19:22] ** Features with one or two measurements across runs are removed.
#> INFO [2026-04-09 15:19:22] ** Fractionation handled.
#> INFO [2026-04-09 15:19:22] ** Updated quantification data to make balanced design. Missing values are marked by NA
#> INFO [2026-04-09 15:19:22] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
head(msstats_input$PTM)
#> ProteinName PeptideSequence PrecursorCharge FragmentIon
#> 37 P09938_S22 AAADALSDLEIKDS[Phospho (STY)]K 2 <NA>
#> 38 P09938_S22 AAADALSDLEIKDS[Phospho (STY)]K 2 <NA>
#> 39 P09938_S22 AAADALSDLEIKDS[Phospho (STY)]K 2 <NA>
#> 40 P09938_S22 AAADALSDLEIKDS[Phospho (STY)]K 2 <NA>
#> 41 P09938_S22 AAADALSDLEIKDS[Phospho (STY)]K 2 <NA>
#> 42 P09938_S22 AAADALSDLEIKDS[Phospho (STY)]K 2 <NA>
#> ProductCharge IsotopeLabelType Condition BioReplicate
#> 37 2 L H100_Y0 H100_Y0_04
#> 38 2 L H100_Y0 H100_Y0_05
#> 39 2 L H100_Y0 H100_Y0_06
#> 40 2 L H0_Y100 H0_Y100_01
#> 41 2 L H0_Y100 H0_Y100_02
#> 42 2 L H0_Y100 H0_Y100_03
#> Run Fraction Intensity
#> 37 20180815_QE3_nLC3_AH_DIA_Honly_ind_01 1 NA
#> 38 20180815_QE3_nLC3_AH_DIA_Honly_ind_02 1 NA
#> 39 20180815_QE3_nLC3_AH_DIA_Honly_ind_03 1 NA
#> 40 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01 1 201390.72
#> 41 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02 1 75962.33
#> 42 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03 1 281808.72
head(msstats_input$PROTEIN)
#> ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge
#> 1 P36578 AAAAAAALQAK 2 <NA> 2
#> 2 P36578 AAAAAAALQAK 2 <NA> 2
#> 3 P36578 AAAAAAALQAK 2 <NA> 2
#> 4 P36578 AAAAAAALQAK 2 <NA> 2
#> 5 P36578 AAAAAAALQAK 2 <NA> 2
#> 6 P36578 AAAAAAALQAK 2 <NA> 2
#> IsotopeLabelType Condition BioReplicate Run
#> 1 L H100_Y0 H100_Y0_04 20180815_QE3_nLC3_AH_DIA_Honly_ind_01
#> 2 L H100_Y0 H100_Y0_05 20180815_QE3_nLC3_AH_DIA_Honly_ind_02
#> 3 L H100_Y0 H100_Y0_06 20180815_QE3_nLC3_AH_DIA_Honly_ind_03
#> 4 L H0_Y100 H0_Y100_01 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01
#> 5 L H0_Y100 H0_Y100_02 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02
#> 6 L H0_Y100 H0_Y100_03 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03
#> Fraction Intensity
#> 1 1 NA
#> 2 1 NA
#> 3 1 NA
#> 4 1 NA
#> 5 1 NA
#> 6 1 NA