Convert SpectroMine output into the required input format for MSstatsTMT.

SpectroMinetoMSstatsTMTFormat(
  input,
  annotation,
  filter_with_Qvalue = TRUE,
  qvalue_cutoff = 0.01,
  useUniquePeptide = TRUE,
  rmPSM_withMissing_withinRun = FALSE,
  rmPSM_withfewMea_withinRun = TRUE,
  rmProtein_with1Feature = FALSE,
  summaryforMultipleRows = sum
)

Arguments

input

data name of SpectroMine PSM output. Read PSM sheet.

annotation

data frame which contains column Run, Fraction, TechRepMixture, Mixture, Channel, BioReplicate, Condition. Refer to the example 'annotation.mine' for the meaning of each column.

filter_with_Qvalue

TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in EG.Qvalue column. Those intensities will be replaced with NA and will be considered as censored missing values for imputation purpose.

qvalue_cutoff

Cutoff for EG.Qvalue. default is 0.01.

useUniquePeptide

TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.

rmPSM_withMissing_withinRun

TRUE will remove PSM with any missing value within each Run. Defaut is FALSE.

rmPSM_withfewMea_withinRun

only for rmPSM_withMissing_withinRun = FALSE. TRUE(default) will remove the features that have 1 or 2 measurements within each Run.

rmProtein_with1Feature

TRUE will remove the proteins which have only 1 peptide and charge. Defaut is FALSE.

summaryforMultipleRows

sum(default) or max - when there are multiple measurements for certain feature in certain run, select the feature with the largest summation or maximal value.

Value

input for proteinSummarization function

Examples

head(raw.mine)
#> R.MS3.Used R.BlockName R.QuantificationMethod #> 1 True All Runs TMT6Plex #> 2 True All Runs TMT6Plex #> 3 True All Runs TMT6Plex #> 4 True All Runs TMT6Plex #> 5 True All Runs TMT6Plex #> 6 True All Runs TMT6Plex #> R.FileName PG.Genes PG.Organisms #> 1 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw EGLN1 Homo sapiens #> 2 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw SEPT11 Homo sapiens #> 3 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw SEPT11 Homo sapiens #> 4 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw SEPT11 Homo sapiens #> 5 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw SEPT11 Homo sapiens #> 6 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw NAP1L4 Homo sapiens #> PG.ProteinAccessions PG.ProteinDescriptions PG.ProteinNames #> 1 Q9GZT9 Egl nine homolog 1 EGLN1_HUMAN #> 2 Q9NVA2 Septin-11 SEP11_HUMAN #> 3 Q9NVA2 Septin-11 SEP11_HUMAN #> 4 Q9NVA2 Septin-11 SEP11_HUMAN #> 5 Q9NVA2 Septin-11 SEP11_HUMAN #> 6 Q99733 Nucleosome assembly protein 1-like 4 NP1L4_HUMAN #> PG.UniprotIds PG.Coverage PG.QValue PEP.IsProteinGroupSpecific #> 1 Q9GZT9 5.6% 0 TRUE #> 2 Q9NVA2 9.1% 0 TRUE #> 3 Q9NVA2 9.1% 0 FALSE #> 4 Q9NVA2 9.1% 0 TRUE #> 5 Q9NVA2 9.1% 0 TRUE #> 6 Q99733 5.1% 0 TRUE #> PEP.IsProteotypic PEP.StrippedSequence PEP.QValue #> 1 TRUE AAAGGQGSAVAAEAEPGKEEPPAR 0.0000000000 #> 2 TRUE KELEEEVNNFQK 0.0001986492 #> 3 FALSE SLDLVTMK 0.0000000000 #> 4 TRUE AAAQLLQSQAQQSGAQQTK 0.0000000000 #> 5 TRUE AAAQLLQSQAQQSGAQQTK 0.0000000000 #> 6 TRUE VLAALQER 0.0024595109 #> PEP.IsUsedForQuantification PP.Charge #> 1 True 3 #> 2 True 3 #> 3 True 2 #> 4 True 3 #> 5 True 2 #> 6 True 2 #> P.MoleculeID PSM.TMT6_126..Raw. #> 1 _[TMT_Nter]AAAGGQGSAVAAEAEPGK[TMT_Lys]EEPPAR_ 382.1107 #> 2 _[TMT_Nter]K[TMT_Lys]ELEEEVNNFQK[TMT_Lys]_ 33554.1900 #> 3 _[TMT_Nter]SLDLVTMK[TMT_Lys]_ 44713.6300 #> 4 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]_ 20877.8700 #> 5 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]_ 506.1669 #> 6 _[TMT_Nter]VLAALQER_ 17143.5200 #> PSM.TMT6_127..Raw. PSM.TMT6_128..Raw. PSM.TMT6_129..Raw. PSM.TMT6_130..Raw. #> 1 392.9243 477.7064 989.6951 695.7537 #> 2 33671.5400 40525.3900 43739.8600 40697.4400 #> 3 51052.8400 58457.6400 47213.1300 58004.6700 #> 4 19930.0100 24963.5400 29021.8300 19510.8600 #> 5 1019.8020 543.2091 555.3629 501.8123 #> 6 4284.8930 26957.7100 15610.6900 21208.9500 #> PSM.TMT6_131..Raw. PSM.IsUsedForQuantification PSM.NrOfMatchedChannelIons #> 1 107.6282 True 6 #> 2 17299.1300 True 6 #> 3 23622.9500 True 6 #> 4 15952.0600 True 6 #> 5 268.5246 True 6 #> 6 7701.3510 True 6 #> PSM.Qvalue #> 1 0.000000e+00 #> 2 2.146015e-04 #> 3 0.000000e+00 #> 4 5.613247e-05 #> 5 0.000000e+00 #> 6 1.951193e-03
head(annotation.mine)
#> Run TechRepMixture Fraction #> 1 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_1.raw 1 1 #> 2 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_2.raw 1 2 #> 3 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_3.raw 1 3 #> 4 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_4.raw 1 4 #> 5 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_5.raw 1 5 #> 6 ch_19Jan2017_SM-1-1_Sp-6-2_CID-OT-MS3-Short_HpH_6.raw 1 6 #> Channel Condition Mixture BioReplicate #> 1 TMT6_126 3 1 1 #> 2 TMT6_126 3 1 1 #> 3 TMT6_126 3 1 1 #> 4 TMT6_126 3 1 1 #> 5 TMT6_126 3 1 1 #> 6 TMT6_126 3 1 1
input.mine <- SpectroMinetoMSstatsTMTFormat(raw.mine, annotation.mine)
#> ** Intensities with great than 0.01 in PG.QValue are replaced with NA.
#> ** Intensities with great than 0.01 in EG.Qvalue are replaced with NA.
#> ** 0 rows have all NAs are removed.
#> ** All peptides are unique peptides in proteins.
#> ** 0 features have 1 or 2 intensities across runs and are removed.
#> ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
#> ** For peptides overlapped between fractions of 1_1, use the fraction with maximal average abundance.
#> ** Fractions belonging to same mixture have been combined.
head(input.mine)
#> ProteinName PeptideSequence Charge #> 1 Q9GZT9 _[TMT_Nter]AAAGGQGSAVAAEAEPGK[TMT_Lys]EEPPAR_ 3 #> 2 Q9NVA2 _[TMT_Nter]K[TMT_Lys]ELEEEVNNFQK[TMT_Lys]_ 3 #> 3 Q9NVA2 _[TMT_Nter]SLDLVTMK[TMT_Lys]_ 2 #> 4 Q9NVA2 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]_ 3 #> 5 Q9NVA2 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]_ 2 #> 6 P06753 _[TMT_Nter]AADAEAEVASLNRR_ 3 #> PSM Mixture TechRepMixture Run #> 1 _[TMT_Nter]AAAGGQGSAVAAEAEPGK[TMT_Lys]EEPPAR__3 1 1 1_1 #> 2 _[TMT_Nter]K[TMT_Lys]ELEEEVNNFQK[TMT_Lys]__3 1 1 1_1 #> 3 _[TMT_Nter]SLDLVTMK[TMT_Lys]__2 1 1 1_1 #> 4 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]__3 1 1 1_1 #> 5 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]__2 1 1 1_1 #> 6 _[TMT_Nter]AADAEAEVASLNRR__3 1 1 1_1 #> Channel BioReplicate Condition Intensity #> 1 TMT6_126 1 3 382.1107 #> 2 TMT6_126 1 3 33554.1900 #> 3 TMT6_126 1 3 44713.6300 #> 4 TMT6_126 1 3 20877.8700 #> 5 TMT6_126 1 3 506.1669 #> 6 TMT6_126 1 3 10065.2800