Convert Proteome discoverer output into the required input format for MSstatsTMT.

PDtoMSstatsTMTFormat(
  input,
  annotation,
  which.proteinid = "Protein.Accessions",
  useNumProteinsColumn = TRUE,
  useUniquePeptide = TRUE,
  rmPSM_withMissing_withinRun = FALSE,
  rmPSM_withfewMea_withinRun = TRUE,
  rmProtein_with1Feature = FALSE,
  summaryforMultipleRows = sum
)

Arguments

input

data name of Proteome discover PSM output.

annotation

data frame which contains column Run, Fraction, TechRepMixture, Mixture, Channel, BioReplicate, Condition. Refer to the example 'annotation.pd' for the meaning of each column.

which.proteinid

Use 'Protein.Accessions'(default) column for protein name. 'Master.Protein.Accessions' can be used instead to get the protein name with single protein.

useNumProteinsColumn

TURE(default) remove shared peptides by information of # Proteins column in PSM sheet.

useUniquePeptide

TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.

rmPSM_withMissing_withinRun

TRUE will remove PSM with any missing value within each Run. Defaut is FALSE.

rmPSM_withfewMea_withinRun

only for rmPSM_withMissing_withinRun = FALSE. TRUE(default) will remove the features that have 1 or 2 measurements within each Run.

rmProtein_with1Feature

TRUE will remove the proteins which have only 1 peptide and charge. Defaut is FALSE.

summaryforMultipleRows

sum(default) or max - when there are multiple measurements for certain feature in certain run, select the feature with the largest summation or maximal value.

Value

input for proteinSummarization function

Examples

head(raw.pd)
#> Checked Confidence Identifying.Node PSM.Ambiguity #> 1: FALSE High Mascot (O4) Unambiguous #> 2: FALSE High Mascot (K2) Unambiguous #> 3: FALSE High Mascot (K2) Unambiguous #> 4: FALSE High Mascot (F2) Selected #> 5: FALSE High Mascot (K2) Unambiguous #> 6: FALSE High Mascot (K2) Unambiguous #> Annotated.Sequence #> 1: [K].gFQQILAGEYDHLPEQAFYMVGPIEEAVAk.[A] #> 2: [R].qYPWGVAEVENGEHcDFTILr.[N] #> 3: [R].dkPSVEPVEEYDYEDLk.[E] #> 4: [R].hEHQVMLmr.[Q] #> 5: [R].dNLTLWTADNAGEEGGEAPQEPQS.[-] #> 6: [R].aLVAIGTHDLDTLSGPFTYTAk.[R] #> Modifications Marked.as #> 1: N-Term(TMT6plex); K30(TMT6plex) NA #> 2: N-Term(TMT6plex); C15(Carbamidomethyl); R21(Label:13C(6)15N(4)) NA #> 3: N-Term(TMT6plex); K2(Label); K17(Label) NA #> 4: N-Term(TMT6plex); M8(Oxidation); R9(Label:13C(6)15N(4)) NA #> 5: N-Term(TMT6plex) NA #> 6: N-Term(TMT6plex); K22(Label) NA #> X..Protein.Groups X..Proteins Master.Protein.Accessions #> 1: 1 1 P06576 #> 2: 1 1 Q16181 #> 3: 1 1 Q9Y450 #> 4: 1 1 Q15233 #> 5: 1 1 P31947 #> 6: 1 1 Q9NSD9 #> Master.Protein.Descriptions #> 1: ATP synthase subunit beta, mitochondrial OS=Homo sapiens GN=ATP5B PE=1 SV=3 #> 2: Septin-7 OS=Homo sapiens GN=SEPT7 PE=1 SV=2 #> 3: HBS1-like protein OS=Homo sapiens GN=HBS1L PE=1 SV=1 #> 4: Non-POU domain-containing octamer-binding protein OS=Homo sapiens GN=NONO PE=1 SV=4 #> 5: 14-3-3 protein sigma OS=Homo sapiens GN=SFN PE=1 SV=1 #> 6: Phenylalanine--tRNA ligase beta subunit OS=Homo sapiens GN=FARSB PE=1 SV=3 #> Protein.Accessions #> 1: P06576 #> 2: Q16181 #> 3: Q9Y450 #> 4: Q15233 #> 5: P31947 #> 6: Q9NSD9 #> Protein.Descriptions #> 1: ATP synthase subunit beta, mitochondrial OS=Homo sapiens GN=ATP5B PE=1 SV=3 #> 2: Septin-7 OS=Homo sapiens GN=SEPT7 PE=1 SV=2 #> 3: HBS1-like protein OS=Homo sapiens GN=HBS1L PE=1 SV=1 #> 4: Non-POU domain-containing octamer-binding protein OS=Homo sapiens GN=NONO PE=1 SV=4 #> 5: 14-3-3 protein sigma OS=Homo sapiens GN=SFN PE=1 SV=1 #> 6: Phenylalanine--tRNA ligase beta subunit OS=Homo sapiens GN=FARSB PE=1 SV=3 #> X..Missed.Cleavages Charge DeltaScore DeltaCn Rank Search.Engine.Rank #> 1: 0 3 1.0000 0 1 1 #> 2: 0 3 1.0000 0 1 1 #> 3: 1 3 0.9730 0 1 1 #> 4: 0 4 0.5250 0 1 1 #> 5: 0 3 1.0000 0 1 1 #> 6: 0 3 0.9783 0 1 1 #> m.z..Da. MH...Da. Theo..MH...Da. DeltaM..ppm. Deltam.z..Da. Activation.Type #> 1: 1270.3249 3808.960 3808.966 -1.51 -0.00192 CID #> 2: 920.4493 2759.333 2759.332 0.31 0.00028 CID #> 3: 920.1605 2758.467 2758.461 2.08 0.00192 CID #> 4: 359.6898 1435.737 1435.738 -0.04 -0.00002 CID #> 5: 920.0943 2758.268 2758.264 1.53 0.00141 CID #> 6: 919.8502 2757.536 2757.532 1.48 0.00136 CID #> MS.Order Isolation.Interference.... Average.Reporter.S.N #> 1: MS2 47.955590 8.7 #> 2: MS2 9.377507 8.1 #> 3: MS2 38.317050 17.8 #> 4: MS2 21.390040 36.5 #> 5: MS2 0.000000 16.7 #> 6: MS2 30.619960 26.7 #> Ion.Inject.Time..ms. RT..min. First.Scan #> 1: 50.000 212.2487 112815 #> 2: 3.242 164.7507 87392 #> 3: 13.596 143.4534 74786 #> 4: 50.000 21.6426 6458 #> 5: 6.723 174.1863 92950 #> 6: 8.958 176.4863 94294 #> Spectrum.File File.ID Abundance..126 #> 1: 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_03.raw F1 2548.326 #> 2: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 22861.765 #> 3: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 25504.083 #> 4: 161117_SILAC_HeLa_UPS1_TMT10_Mixture4_02.raw F10 13493.228 #> 5: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 64582.786 #> 6: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 35404.709 #> Abundance..127N Abundance..127C Abundance..128N Abundance..128C #> 1: 3231.929 2760.839 4111.639 3127.254 #> 2: 25817.946 23349.498 29449.609 25995.929 #> 3: 27740.450 25144.974 25754.579 29923.176 #> 4: 14674.490 11187.900 12831.495 13839.426 #> 5: 50576.417 47126.037 56285.129 46257.310 #> 6: 31905.852 30993.941 36854.351 37506.001 #> Abundance..129N Abundance..129C Abundance..130N Abundance..130C #> 1: 1874.163 2831.423 2298.401 3798.876 #> 2: 22955.769 30578.971 30660.488 38728.853 #> 3: 34097.637 31650.255 27632.692 23886.881 #> 4: 12441.353 13450.885 14777.844 13039.995 #> 5: 52634.885 49716.850 60660.574 55830.488 #> 6: 25703.444 38626.598 35447.942 33788.409 #> Abundance..131 Quan.Info Ions.Score Identity.Strict Identity.Relaxed #> 1: 3739.067 NA 90 28 21 #> 2: 25047.280 NA 76 24 17 #> 3: 35331.092 NA 74 30 23 #> 4: 12057.121 NA 40 25 18 #> 5: 40280.577 NA 38 21 14 #> 6: 32031.516 NA 46 29 22 #> Expectation.Value Percolator.q.Value Percolator.PEP #> 1: 7.038672e-09 0 1.396e-05 #> 2: 6.298627e-08 0 3.349e-07 #> 3: 4.318385e-07 0 9.922e-07 #> 4: 3.351211e-04 0 1.175e-04 #> 5: 2.152501e-04 0 1.383e-05 #> 6: 2.060469e-04 0 7.198e-05
head(annotation.pd)
#> Run Fraction TechRepMixture Channel #> 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 126 #> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 127N #> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 127C #> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 128N #> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 128C #> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 129N #> Condition Mixture BioReplicate #> 1 Norm Mixture1 Mixture1_Norm #> 2 0.667 Mixture1 Mixture1_0.667 #> 3 0.125 Mixture1 Mixture1_0.125 #> 4 0.5 Mixture1 Mixture1_0.5 #> 5 1 Mixture1 Mixture1_1 #> 6 0.125 Mixture1 Mixture1_0.125
input.pd <- PDtoMSstatsTMTFormat(raw.pd, annotation.pd)
#> ** Shared PSMs (assigned in multiple proteins) are removed.
#> ** 55 features have 1 or 2 intensities within a run and are removed.
#> ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
head(input.pd)
#> ProteinName PeptideSequence Charge #> 1 P04406 [K].lISWYDNEFGYSNR.[V] 2 #> 2 Q9NSD9 [K].irPFAVAAVLr.[N] 3 #> 3 P04406 [K].lVINGNPITIFQErDPSk.[I] 3 #> 4 P04406 [R].vVDLmAHMASkE.[-] 3 #> 5 P06576 [R].dQEGQDVLLFIDNIFR.[F] 3 #> 6 P06576 [R].iPSAVGYQPTLATDMGTMQEr.[I] 3 #> PSM Mixture TechRepMixture #> 1 [K].lISWYDNEFGYSNR.[V]_2 Mixture1 1 #> 2 [K].irPFAVAAVLr.[N]_3 Mixture1 1 #> 3 [K].lVINGNPITIFQErDPSk.[I]_3 Mixture1 1 #> 4 [R].vVDLmAHMASkE.[-]_3 Mixture1 1 #> 5 [R].dQEGQDVLLFIDNIFR.[F]_3 Mixture1 1 #> 6 [R].iPSAVGYQPTLATDMGTMQEr.[I]_3 Mixture1 1 #> Run Channel Condition BioReplicate #> 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm #> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm #> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm #> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm #> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm #> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm #> Intensity #> 1 8348.351 #> 2 28327.492 #> 3 1275010.965 #> 4 80589.877 #> 5 2231.389 #> 6 144854.307