MSstatsTMT.Rmd
This vignette summarizes the introduction and various options of all functionalities in MSstatsTMT.
MSstatsTMT includes the following three steps for statistical testing:
PDtoMSstatsTMTFormat
, MaxQtoMSstatsTMTFormat
, SpectroMinetoMSstatsTMTFormat
and OpenMStoMSstatsTMTFormat
.proteinSummarization
groupComparisonTMT
Preprocess PSM data from Proteome Discoverer and convert into the required input format for MSstatsTMT.
input
: data name of Proteome discover PSM output. Read PSM sheet.annotation
: data frame which contains column Run
, Fraction
, TechRepMixture
, Channel
, Condition
, BioReplicate
, Mixture
.which.proteinid
: Use Protein.Accessions
(default) column for protein name. Master.Protein.Accessions
can be used instead.useNumProteinsColumn
: TURE(default) remove shared peptides by information of # Proteins column in PSM sheet.useUniquePeptide
: TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.rmPSM_withMissing_withinRun
: TRUE will remove PSM with any missing value within each Run. Default is FALSE.rmPSM_withfewMea_withinRun
: only for rmPSM_withMissing_withinRun = FALSE. TRUE(default) will remove the features that have 1 or 2 measurements within each Run.removeProtein_with1Peptide
: TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.summaryforMultipleRows
: sum(default) or max - when there are multiple measurements for certain PSM in certain run, select the PSM with the largest summation or maximal value.
# read in PD PSM sheet
# raw.pd <- read.delim("161117_SILAC_HeLa_UPS1_TMT10_5Mixtures_3TechRep_UPSdB_Multiconsensus_PD22_Intensity_PSMs.txt")
head(raw.pd)
#> Checked Confidence Identifying.Node PSM.Ambiguity
#> 1: FALSE High Mascot (O4) Unambiguous
#> 2: FALSE High Mascot (K2) Unambiguous
#> 3: FALSE High Mascot (K2) Unambiguous
#> 4: FALSE High Mascot (F2) Selected
#> 5: FALSE High Mascot (K2) Unambiguous
#> 6: FALSE High Mascot (K2) Unambiguous
#> Annotated.Sequence
#> 1: [K].gFQQILAGEYDHLPEQAFYMVGPIEEAVAk.[A]
#> 2: [R].qYPWGVAEVENGEHcDFTILr.[N]
#> 3: [R].dkPSVEPVEEYDYEDLk.[E]
#> 4: [R].hEHQVMLmr.[Q]
#> 5: [R].dNLTLWTADNAGEEGGEAPQEPQS.[-]
#> 6: [R].aLVAIGTHDLDTLSGPFTYTAk.[R]
#> Modifications Marked.as
#> 1: N-Term(TMT6plex); K30(TMT6plex) NA
#> 2: N-Term(TMT6plex); C15(Carbamidomethyl); R21(Label:13C(6)15N(4)) NA
#> 3: N-Term(TMT6plex); K2(Label); K17(Label) NA
#> 4: N-Term(TMT6plex); M8(Oxidation); R9(Label:13C(6)15N(4)) NA
#> 5: N-Term(TMT6plex) NA
#> 6: N-Term(TMT6plex); K22(Label) NA
#> X..Protein.Groups X..Proteins Master.Protein.Accessions
#> 1: 1 1 P06576
#> 2: 1 1 Q16181
#> 3: 1 1 Q9Y450
#> 4: 1 1 Q15233
#> 5: 1 1 P31947
#> 6: 1 1 Q9NSD9
#> Master.Protein.Descriptions
#> 1: ATP synthase subunit beta, mitochondrial OS=Homo sapiens GN=ATP5B PE=1 SV=3
#> 2: Septin-7 OS=Homo sapiens GN=SEPT7 PE=1 SV=2
#> 3: HBS1-like protein OS=Homo sapiens GN=HBS1L PE=1 SV=1
#> 4: Non-POU domain-containing octamer-binding protein OS=Homo sapiens GN=NONO PE=1 SV=4
#> 5: 14-3-3 protein sigma OS=Homo sapiens GN=SFN PE=1 SV=1
#> 6: Phenylalanine--tRNA ligase beta subunit OS=Homo sapiens GN=FARSB PE=1 SV=3
#> Protein.Accessions
#> 1: P06576
#> 2: Q16181
#> 3: Q9Y450
#> 4: Q15233
#> 5: P31947
#> 6: Q9NSD9
#> Protein.Descriptions
#> 1: ATP synthase subunit beta, mitochondrial OS=Homo sapiens GN=ATP5B PE=1 SV=3
#> 2: Septin-7 OS=Homo sapiens GN=SEPT7 PE=1 SV=2
#> 3: HBS1-like protein OS=Homo sapiens GN=HBS1L PE=1 SV=1
#> 4: Non-POU domain-containing octamer-binding protein OS=Homo sapiens GN=NONO PE=1 SV=4
#> 5: 14-3-3 protein sigma OS=Homo sapiens GN=SFN PE=1 SV=1
#> 6: Phenylalanine--tRNA ligase beta subunit OS=Homo sapiens GN=FARSB PE=1 SV=3
#> X..Missed.Cleavages Charge DeltaScore DeltaCn Rank Search.Engine.Rank
#> 1: 0 3 1.0000 0 1 1
#> 2: 0 3 1.0000 0 1 1
#> 3: 1 3 0.9730 0 1 1
#> 4: 0 4 0.5250 0 1 1
#> 5: 0 3 1.0000 0 1 1
#> 6: 0 3 0.9783 0 1 1
#> m.z..Da. MH...Da. Theo..MH...Da. DeltaM..ppm. Deltam.z..Da. Activation.Type
#> 1: 1270.3249 3808.960 3808.966 -1.51 -0.00192 CID
#> 2: 920.4493 2759.333 2759.332 0.31 0.00028 CID
#> 3: 920.1605 2758.467 2758.461 2.08 0.00192 CID
#> 4: 359.6898 1435.737 1435.738 -0.04 -0.00002 CID
#> 5: 920.0943 2758.268 2758.264 1.53 0.00141 CID
#> 6: 919.8502 2757.536 2757.532 1.48 0.00136 CID
#> MS.Order Isolation.Interference.... Average.Reporter.S.N
#> 1: MS2 47.955590 8.7
#> 2: MS2 9.377507 8.1
#> 3: MS2 38.317050 17.8
#> 4: MS2 21.390040 36.5
#> 5: MS2 0.000000 16.7
#> 6: MS2 30.619960 26.7
#> Ion.Inject.Time..ms. RT..min. First.Scan
#> 1: 50.000 212.2487 112815
#> 2: 3.242 164.7507 87392
#> 3: 13.596 143.4534 74786
#> 4: 50.000 21.6426 6458
#> 5: 6.723 174.1863 92950
#> 6: 8.958 176.4863 94294
#> Spectrum.File File.ID Abundance..126
#> 1: 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_03.raw F1 2548.326
#> 2: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 22861.765
#> 3: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 25504.083
#> 4: 161117_SILAC_HeLa_UPS1_TMT10_Mixture4_02.raw F10 13493.228
#> 5: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 64582.786
#> 6: 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw F5 35404.709
#> Abundance..127N Abundance..127C Abundance..128N Abundance..128C
#> 1: 3231.929 2760.839 4111.639 3127.254
#> 2: 25817.946 23349.498 29449.609 25995.929
#> 3: 27740.450 25144.974 25754.579 29923.176
#> 4: 14674.490 11187.900 12831.495 13839.426
#> 5: 50576.417 47126.037 56285.129 46257.310
#> 6: 31905.852 30993.941 36854.351 37506.001
#> Abundance..129N Abundance..129C Abundance..130N Abundance..130C
#> 1: 1874.163 2831.423 2298.401 3798.876
#> 2: 22955.769 30578.971 30660.488 38728.853
#> 3: 34097.637 31650.255 27632.692 23886.881
#> 4: 12441.353 13450.885 14777.844 13039.995
#> 5: 52634.885 49716.850 60660.574 55830.488
#> 6: 25703.444 38626.598 35447.942 33788.409
#> Abundance..131 Quan.Info Ions.Score Identity.Strict Identity.Relaxed
#> 1: 3739.067 NA 90 28 21
#> 2: 25047.280 NA 76 24 17
#> 3: 35331.092 NA 74 30 23
#> 4: 12057.121 NA 40 25 18
#> 5: 40280.577 NA 38 21 14
#> 6: 32031.516 NA 46 29 22
#> Expectation.Value Percolator.q.Value Percolator.PEP
#> 1: 7.038672e-09 0 1.396e-05
#> 2: 6.298627e-08 0 3.349e-07
#> 3: 4.318385e-07 0 9.922e-07
#> 4: 3.351211e-04 0 1.175e-04
#> 5: 2.152501e-04 0 1.383e-05
#> 6: 2.060469e-04 0 7.198e-05
# Read in annotation including condition and biological replicates per run and channel.
# Users should make this annotation file. It is not the output from Proteome Discoverer.
# annotation.pd <- read.csv(file="PD_Annotation.csv", header=TRUE)
head(annotation.pd)
#> Run Fraction TechRepMixture Channel
#> 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 126
#> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 127N
#> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 127C
#> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 128N
#> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 128C
#> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 1 1 129N
#> Condition Mixture BioReplicate
#> 1 Norm Mixture1 Mixture1_Norm
#> 2 0.667 Mixture1 Mixture1_0.667
#> 3 0.125 Mixture1 Mixture1_0.125
#> 4 0.5 Mixture1 Mixture1_0.5
#> 5 1 Mixture1 Mixture1_1
#> 6 0.125 Mixture1 Mixture1_0.125
# do not remove PSM with missing values within one run
input.pd <- PDtoMSstatsTMTFormat(raw.pd, annotation.pd)
#> ** Shared PSMs (assigned in multiple proteins) are removed.
#> ** 55 features have 1 or 2 intensities within a run and are removed.
#> ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
head(input.pd)
#> ProteinName PeptideSequence Charge
#> 1 P04406 [K].lISWYDNEFGYSNR.[V] 2
#> 2 Q9NSD9 [K].irPFAVAAVLr.[N] 3
#> 3 P04406 [K].lVINGNPITIFQErDPSk.[I] 3
#> 4 P04406 [R].vVDLmAHMASkE.[-] 3
#> 5 P06576 [R].dQEGQDVLLFIDNIFR.[F] 3
#> 6 P06576 [R].iPSAVGYQPTLATDMGTMQEr.[I] 3
#> PSM Mixture TechRepMixture
#> 1 [K].lISWYDNEFGYSNR.[V]_2 Mixture1 1
#> 2 [K].irPFAVAAVLr.[N]_3 Mixture1 1
#> 3 [K].lVINGNPITIFQErDPSk.[I]_3 Mixture1 1
#> 4 [R].vVDLmAHMASkE.[-]_3 Mixture1 1
#> 5 [R].dQEGQDVLLFIDNIFR.[F]_3 Mixture1 1
#> 6 [R].iPSAVGYQPTLATDMGTMQEr.[I]_3 Mixture1 1
#> Run Channel Condition BioReplicate
#> 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm
#> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm
#> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm
#> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm
#> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm
#> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126 Norm Mixture1_Norm
#> Intensity
#> 1 8348.351
#> 2 28327.492
#> 3 1275010.965
#> 4 80589.877
#> 5 2231.389
#> 6 144854.307
# remove PSM with missing values within one run
input.pd.no.miss <- PDtoMSstatsTMTFormat(raw.pd, annotation.pd,
rmPSM_withMissing_withinRun = TRUE)
#> ** Shared PSMs (assigned in multiple proteins) are removed.
#> ** Rows which has any missing value within a run were removed from that run.
#> ** 0 features have 1 or 2 intensities within a run and are removed.
#> ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
head(input.pd.no.miss)
#> ProteinName PeptideSequence Charge PSM
#> 1 P12277 [K].lAVEALSSLDGDLAGr.[Y] 3 [K].lAVEALSSLDGDLAGr.[Y]_3
#> 2 P04406 [K].lVINGNPITIFQEr.[D] 3 [K].lVINGNPITIFQEr.[D]_3
#> 3 Q16181 [K].dVTNNVHYENYr.[S] 3 [K].dVTNNVHYENYr.[S]_3
#> 4 P04406 [K].qASEGPLk.[G] 2 [K].qASEGPLk.[G]_2
#> 5 Q15233 [R].rQQEEMMr.[R] 3 [R].rQQEEMMr.[R]_3
#> 6 P06576 [R].dQEGQDVLLFIDNIFr.[F] 3 [R].dQEGQDVLLFIDNIFr.[F]_3
#> Mixture TechRepMixture Run Channel
#> 1 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126
#> 2 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126
#> 3 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126
#> 4 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126
#> 5 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126
#> 6 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw 126
#> Condition BioReplicate Intensity
#> 1 Norm Mixture1_Norm 23037.057
#> 2 Norm Mixture1_Norm 349661.432
#> 3 Norm Mixture1_Norm 40699.454
#> 4 Norm Mixture1_Norm 13882.684
#> 5 Norm Mixture1_Norm 9302.419
#> 6 Norm Mixture1_Norm 12261.325
Preprocess PSM-level data from MaxQuant and convert into the required input format for MSstatsTMT.
evidence
: name of evidence.txt
data, which includes PSM-level data.proteinGroups
: name of proteinGroups.txt
data, which contains the detailed information of protein identifications.annotation
: data frame which contains column Run
, Fraction
, TechRepMixture
, Channel
, Condition
, BioReplicate
, Mixture
.which.proteinid
: Use Proteins
(default) column for protein name. Leading.proteins
or Leading.razor.proteins
can be used instead. However, those can potentially have the shared peptides.rmProt_Only.identified.by.site
: TRUE will remove proteins with ‘+’ in ‘Only.identified.by.site’ column from proteinGroups.txt, which was identified only by a modification site. FALSE is the default.useUniquePeptide
: TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.rmPSM_withMissing_withinRun
: TRUE will remove PSM with any missing value within each Run. Default is FALSE.rmPSM_withfewMea_withinRun
: only for rmPSM_withMissing_withinRun = FALSE. TRUE(default) will remove the features that have 1 or 2 measurements within each Run.removeProtein_with1Peptide
: TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.summaryforMultipleRows
: sum(default) or max - when there are multiple measurements for certain PSM in certain run, select the PSM with the largest summation or maximal value.
# Read in MaxQuant files
# proteinGroups <- read.table("proteinGroups.txt", sep="\t", header=TRUE)
# evidence <- read.table("evidence.txt", sep="\t", header=TRUE)
# Users should make this annotation file. It is not the output from MaxQuant.
# annotation.mq <- read.csv(file="MQ_Annotation.csv", header=TRUE)
input.mq <- MaxQtoMSstatsTMTFormat(evidence, proteinGroups, annotation.mq)
#> ** + Contaminant, + Reverse, + Only.identified.by.site, proteins are removed.
#> ** PSMs, that have all zero intensities across channels in each run, are removed.
#> ** 2 features have 1 or 2 intensities across runs and are removed.
#> ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
head(input.mq)
#> ProteinName PeptideSequence Charge PSM
#> 1 O15042 AAAEIYEEFLAAFEGSDGNK(ly) 3 AAAEIYEEFLAAFEGSDGNK(ly)_3
#> 2 Q9P258 DGQILPVPNVVVR(ar) 3 DGQILPVPNVVVR(ar)_3
#> 3 Q96P70 ICPFTIAIFLK(ly) 3 ICPFTIAIFLK(ly)_3
#> 4 P36578 FCIWTESAFR(ar)K(ly) 3 FCIWTESAFR(ar)K(ly)_3
#> 5 Q9P258 AAAAAWEEPSSGNGTAR(ar) 2 AAAAAWEEPSSGNGTAR(ar)_2
#> 6 Q96P70 VWTANPQQFVEDEDDDTFSYTVR(ar) 3 VWTANPQQFVEDEDDDTFSYTVR(ar)_3
#> Mixture TechRepMixture Run Channel
#> 1 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01 channel.0
#> 2 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01 channel.0
#> 3 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01 channel.0
#> 4 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01 channel.0
#> 5 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01 channel.0
#> 6 Mixture1 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01 channel.0
#> BioReplicate Condition Intensity
#> 1 Mixture1_Norm Norm 1031.50
#> 2 Mixture1_Norm Norm 2219.20
#> 3 Mixture1_Norm Norm 478.17
#> 4 Mixture1_Norm Norm 534.43
#> 5 Mixture1_Norm Norm 866.26
#> 6 Mixture1_Norm Norm 388.78
Preprocess PSM data from SpectroMine and convert into the required input format for MSstatsTMT.
input
: data name of SpectroMine PSM output. Read PSM sheet.annotation
: data frame which contains column Run
, Fraction
, TechRepMixture
, Channel
, Condition
, BioReplicate
, Mixture
.filter_with_Qvalue
: TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in EG.Qvalue column. Those intensities will be replaced with NA and will be considered as censored missing values for imputation purpose.qvalue_cutoff
: Cutoff for EG.Qvalue. default is 0.01.useUniquePeptide
: TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.rmPSM_withMissing_withinRun
: TRUE will remove PSM with any missing value within each Run. Default is FALSE.rmPSM_withfewMea_withinRun
: only for rmPSM_withMissing_withinRun = FALSE
. TRUE(default) will remove the features that have 1 or 2 measurements within each Run.removeProtein_with1Peptide
: TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.summaryforMultipleRows
: sum(default) or max - when there are multiple measurements for certain PSM in certain run, select the PSM with the largest summation or maximal value.remove_norm_channel
: TRUE(default) removes Norm
channels from protein level data.remove_empty_channel
: TRUE(default) removes Empty
channels from protein level data.
# Read in SpectroMine PSM report
# raw.mine <- read.csv('20180831_095547_CID-OT-MS3-Short_PSM Report_20180831_103118.xls', sep="\t")
# Users should make this annotation file. It is not the output from SpectroMine
# annotation.mine <- read.csv(file="Mine_Annotation.csv", header=TRUE)
input.mine <- SpectroMinetoMSstatsTMTFormat(raw.mine, annotation.mine)
#> ** Intensities with great than 0.01 in PG.QValue are replaced with NA.
#> ** Intensities with great than 0.01 in EG.Qvalue are replaced with NA.
#> ** 0 rows have all NAs are removed.
#> ** All peptides are unique peptides in proteins.
#> ** 0 features have 1 or 2 intensities across runs and are removed.
#> ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows.
#> ** For peptides overlapped between fractions of 1_1, use the fraction with maximal average abundance.
#> ** Fractions belonging to same mixture have been combined.
head(input.mine)
#> ProteinName PeptideSequence Charge
#> 1 Q9GZT9 _[TMT_Nter]AAAGGQGSAVAAEAEPGK[TMT_Lys]EEPPAR_ 3
#> 2 Q9NVA2 _[TMT_Nter]K[TMT_Lys]ELEEEVNNFQK[TMT_Lys]_ 3
#> 3 Q9NVA2 _[TMT_Nter]SLDLVTMK[TMT_Lys]_ 2
#> 4 Q9NVA2 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]_ 3
#> 5 Q9NVA2 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]_ 2
#> 6 P06753 _[TMT_Nter]AADAEAEVASLNRR_ 3
#> PSM Mixture TechRepMixture Run
#> 1 _[TMT_Nter]AAAGGQGSAVAAEAEPGK[TMT_Lys]EEPPAR__3 1 1 1_1
#> 2 _[TMT_Nter]K[TMT_Lys]ELEEEVNNFQK[TMT_Lys]__3 1 1 1_1
#> 3 _[TMT_Nter]SLDLVTMK[TMT_Lys]__2 1 1 1_1
#> 4 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]__3 1 1 1_1
#> 5 _[TMT_Nter]AAAQLLQSQAQQSGAQQTK[TMT_Lys]__2 1 1 1_1
#> 6 _[TMT_Nter]AADAEAEVASLNRR__3 1 1 1_1
#> Channel BioReplicate Condition Intensity
#> 1 TMT6_126 1 3 382.1107
#> 2 TMT6_126 1 3 33554.1900
#> 3 TMT6_126 1 3 44713.6300
#> 4 TMT6_126 1 3 20877.8700
#> 5 TMT6_126 1 3 506.1669
#> 6 TMT6_126 1 3 10065.2800
Preprocess MSstatsTMT report from OpenMS and convert into the required input format for MSstatsTMT.
input
: data name of MSstatsTMT report from OpenMS. Read csv file.useUniquePeptide
: TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.rmPSM_withMissing_withinRun
: TRUE will remove PSM with any missing value within each Run. Default is FALSE.rmPSM_withfewMea_withinRun
: only for rmPSM_withMissing_withinRun = FALSE. TRUE(default) will remove the features that have 1 or 2 measurements within each Run.removeProtein_with1Peptide
: TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.summaryforMultipleRows
: sum(default) or max - when there are multiple measurements for certain PSM in certain run, select the PSM with the largest summation or maximal value.
# read in MSstatsTMT report from OpenMS
# raw.om <- read.csv("OpenMS_20200222/20200225_MSstatsTMT_OpenMS_Export.csv")
head(raw.om)
#> RetentionTime ProteinName PeptideSequence Charge
#> 1 2924.491 sp|P11679|K2C8_MOUSE .(TMT6plex)AEAETMYQIK(TMT6plex) 2
#> 2 2924.491 sp|P11679|K2C8_MOUSE .(TMT6plex)AEAETMYQIK(TMT6plex) 2
#> 3 2924.491 sp|P11679|K2C8_MOUSE .(TMT6plex)AEAETMYQIK(TMT6plex) 2
#> 4 2924.491 sp|P11679|K2C8_MOUSE .(TMT6plex)AEAETMYQIK(TMT6plex) 2
#> 5 2924.491 sp|P11679|K2C8_MOUSE .(TMT6plex)AEAETMYQIK(TMT6plex) 2
#> 6 2924.491 sp|P11679|K2C8_MOUSE .(TMT6plex)AEAETMYQIK(TMT6plex) 2
#> Channel Condition BioReplicate Run Mixture TechRepMixture Fraction
#> 1 1 Long_LF 1 1_1_3 1 1_1 3
#> 2 2 Long_LF 2 1_1_3 1 1_1 3
#> 3 3 Long_M 3 1_1_3 1 1_1 3
#> 4 6 Long_M 6 1_1_3 1 1_1 3
#> 5 5 Norm 5 1_1_3 1 1_1 3
#> 6 9 Norm 9 1_1_3 1 1_1 3
#> Intensity
#> 1 5727.319
#> 2 6985.365
#> 3 4553.897
#> 4 5937.782
#> 5 5151.292
#> 6 6800.128
#> Reference
#> 1 PAMI-176_Mouse_A-J_TMT_40ug_22pctACN_25cm_120min_20160223_OT.mzML_controllerType=0 controllerNumber=1 scan=11324
#> 2 PAMI-176_Mouse_A-J_TMT_40ug_22pctACN_25cm_120min_20160223_OT.mzML_controllerType=0 controllerNumber=1 scan=11324
#> 3 PAMI-176_Mouse_A-J_TMT_40ug_22pctACN_25cm_120min_20160223_OT.mzML_controllerType=0 controllerNumber=1 scan=11324
#> 4 PAMI-176_Mouse_A-J_TMT_40ug_22pctACN_25cm_120min_20160223_OT.mzML_controllerType=0 controllerNumber=1 scan=11324
#> 5 PAMI-176_Mouse_A-J_TMT_40ug_22pctACN_25cm_120min_20160223_OT.mzML_controllerType=0 controllerNumber=1 scan=11324
#> 6 PAMI-176_Mouse_A-J_TMT_40ug_22pctACN_25cm_120min_20160223_OT.mzML_controllerType=0 controllerNumber=1 scan=11324
# the function only requries one input file
input.om <- OpenMStoMSstatsTMTFormat(raw.om)
#> Joining, by = c("RetentionTime", "ProteinName", "PeptideSequence", "Charge", "Run", "Reference")
#> ** PSMs, that have all zero intensities across channels in each run, are removed.
#> Joining, by = c("RetentionTime", "ProteinName", "PeptideSequence", "Charge", "Run", "Reference")
#> ** 2 features have 1 or 2 intensities across runs are removed.
#> Joining, by = c("Run", "Channel")
#> ** PSMs have been aggregated to peptide ions.
#> ** For peptides overlapped between fractions of 2_2_2, use the fraction with maximal average abundance.
#> ** For peptides overlapped between fractions of 3_3_3, use the fraction with maximal average abundance.
#> ** Fractions belonging to same mixture have been combined.
head(input.om)
#> ProteinName PeptideSequence Charge
#> 1 sp|O08663|MAP2_MOUSE .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR 2
#> 2 sp|O08663|MAP2_MOUSE .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR 2
#> 3 sp|O08663|MAP2_MOUSE .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR 2
#> 4 sp|O08663|MAP2_MOUSE .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR 2
#> 5 sp|O08663|MAP2_MOUSE .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR 2
#> 6 sp|O08663|MAP2_MOUSE .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR 2
#> PSM Mixture TechRepMixture Run
#> 1 .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR_2 1 1_1 1_1_1
#> 2 .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR_2 1 1_1 1_1_1
#> 3 .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR_2 1 1_1 1_1_1
#> 4 .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR_2 1 1_1 1_1_1
#> 5 .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR_2 1 1_1 1_1_1
#> 6 .(TMT6plex)GQEC(Carbamidomethyl)EYPPTQDGR_2 1 1_1 1_1_1
#> Channel Condition BioReplicate Intensity
#> 1 1 Long_LF 1 18748.36
#> 2 10 Short_LF 10 15084.31
#> 3 2 Long_LF 2 19591.20
#> 4 3 Long_M 3 17800.54
#> 5 4 Short_LF 4 21316.78
#> 6 5 Norm 5 17607.60
Global median normalization is first applied to peptide level quantification data (equalizing the medians across all the channels and MS runs). Protein-level summarization from peptide level quantification should be performed before testing differentially abundant proteins. Then, normalization between MS runs using normalization channels will be implemented. In particular, protein summarization method MSstats
assumes missing values are censored and then imputes the missing values before summarizing peptide level data into protein level data. Other methods, including MedianPolish
, Median
and LogSum
, do not impute missing values.
data
: Name of the output of PDtoMSstatsTMTFormat function or peptide-level quantified data from other tools. It should have columns named Protein
, PSM
, TechRepMixture
, Mixture
, Run
, Channel
, Condition
, BioReplicate
, Intensity
.method
: Four different summarization methods to protein-level can be performed : msstats
(default), MedianPolish
, Median
, LogSum
.global_norm
: Global median normalization on peptide level data (equalizing the medians across all the channels and MS runs). Default is TRUE. It will be performed before protein-level summarization.reference_norm
: Reference channel based normalization between MS runs. TRUE(default) needs at least one reference channel in each MS run, annotated by Norm
in Condtion column. It will be performed after protein-level summarization. FALSE will not perform this normalization step. If data only has one run, then reference_norm=FALSE.remove_norm_channel
: TRUE(default) removes Norm
channels from protein level data.remove_empty_channel
: TRUE(default) removes Empty
channels from protein level data.MBimpute
: only for method = "msstats"
. TRUE (default) imputes missing values by Accelated failure model. FALSE uses minimum value to impute the missing value for each peptide precursor ion.maxQuantileforCensored
: We assume missing values are censored. maxQuantileforCensored
is Maximum quantile for deciding censored missing value, for instance, 0.999. Default is Null.# use MSstats for protein summarization
quant.msstats <- proteinSummarization(input.pd,
method="msstats",
global_norm=TRUE,
reference_norm=TRUE,
remove_norm_channel = TRUE,
remove_empty_channel = TRUE)
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 4-29
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 3-33
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 3-29
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 1-28
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 1-30
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 2-30
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 4-31
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 3-30
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 5-30
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 3-31
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 3-31
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 1-31
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 3-34
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 2-30
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
#> Summary of Features :
#> count
#> # of Protein 10
#> # of Peptides/Protein 5-32
#> # of Transitions/Peptide 1-1
#>
#> Summary of Samples :
#> 0.125 0.5 0.667 1 Norm
#> # of MS runs 2 2 2 2 2
#> # of Biological Replicates 1 1 1 1 1
#> # of Technical Replicates 2 2 2 2 2
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
head(quant.msstats)
#> Run Protein Abundance Channel
#> 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.59812 127C
#> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.55729 129N
#> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.71783 128N
#> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.67190 129C
#> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.51106 127N
#> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.49448 130C
#> BioReplicate Condition TechRepMixture Mixture
#> 1 Mixture1_0.125 0.125 1 Mixture1
#> 2 Mixture1_0.125 0.125 1 Mixture1
#> 3 Mixture1_0.5 0.5 1 Mixture1
#> 4 Mixture1_0.5 0.5 1 Mixture1
#> 5 Mixture1_0.667 0.667 1 Mixture1
#> 6 Mixture1_0.667 0.667 1 Mixture1
# use Median for protein summarization
# since median method doesn't impute missing values,
# we need to use the input data without missing values
quant.median <- proteinSummarization(input.pd.no.miss,
method="Median",
global_norm=TRUE,
reference_norm=TRUE,
remove_norm_channel = TRUE,
remove_empty_channel = TRUE)
head(quant.median)
#> Run Protein Abundance Channel
#> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P12277 15.32534 127C
#> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P12277 15.55383 127N
#> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P12277 15.51731 128C
#> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P12277 15.76108 128N
#> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P12277 15.52052 129C
#> 7 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P12277 15.32590 129N
#> BioReplicate Condition TechRepMixture Mixture
#> 2 Mixture1_0.125 0.125 1 Mixture1
#> 3 Mixture1_0.667 0.667 1 Mixture1
#> 4 Mixture1_1 1 1 Mixture1
#> 5 Mixture1_0.5 0.5 1 Mixture1
#> 6 Mixture1_0.5 0.5 1 Mixture1
#> 7 Mixture1_0.125 0.125 1 Mixture1
Visualization for explanatory data analysis. To illustrate the quantitative data after data-preprocessing and quality control of TMT runs, dataProcessPlotsTMT takes the quantitative data from converter functions (PDtoMSstatsTMTFormat
, MQtoMSstatsTMTFormat
and SpectroMinetoMSstatsTMTFormat
) and summarized data from function proteinSummarization
as input. It generates two types of figures in pdf files as output :
profile plot (specify “ProfilePlot” in option type), to identify the potential sources of variation for each protein;
quality control plot (specify “QCPlot” in option type), to evaluate the systematic bias between MS runs.
data.peptide
: name of the data with peptide-level, which can be the output of converter functions (PDtoMSstatsTMTFormat
, MQtoMSstatsTMTFormat
and SpectroMinetoMSstatsTMTFormat
).data.summarization
: name of the data with protein-level, which can be the output of proteinSummarization
function.type
: choice of visualization. “ProfilePlot” represents profile plot of log intensities across MS runs. “QCPlot” represents quality control plot of log intensities across MS runs.ylimUp
: upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot use the upper limit as rounded off maximum of log2(intensities) after normalization + 3.ylimDown
: lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot is 0.x.axis.size
: size of x-axis labeling for “Run” and “channel” in Profile Plot and QC Plot.y.axis.size
: size of y-axis labels. Default is 10.text.size
: size of labels represented each condition at the top of graph in Profile Plot and QC plot. Default is 4.text.angle
: angle of labels represented each condition at the top of graph in Profile Plot and QC plot. Default is 0.legend.size
: size of legend above graph in Profile Plot. Default is 7.dot.size.profile
: size of dots in profile plot. Default is 2.ncol.guide
: number of columns for legends at the top of plot. Default is 5.width
: width of the saved file. Default is 10.height
: height of the saved file. Default is 10.which.Protein
: Protein list to draw plots. List can be names of Proteins or order numbers of Proteins. Default is “all”, which generates all plots for each protein. For QC plot, “allonly” will generate one QC plot with all proteins.originalPlot
: TRUE(default) draws original profile plots, without normalization.summaryPlot
: TRUE(default) draws profile plots with protein summarization for each channel and MS run.address
: the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of “ProfilePlot.pdf” or “QCplot.pdf”. The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window.
## Profile plot without norm channnels and empty channels
dataProcessPlotsTMT(data.peptide = input.pd,
data.summarization = quant.msstats,
type = 'ProfilePlot',
width = 21, # adjust the figure width since there are 15 TMT runs.
height = 7)
#> Warning: Removed 16 rows containing missing values (geom_point).
#> Drew the Profile plot for P04406 ( 1 of 10 )
#> Warning: Removed 29 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
#> Drew the Profile plot for P06576 ( 2 of 10 )
#> Warning: Removed 23 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
#> Drew the Profile plot for P12277 ( 3 of 10 )
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Drew the Profile plot for P23919 ( 4 of 10 )
#> Drew the Profile plot for P31947 ( 5 of 10 )
#> Warning: Removed 52 rows containing missing values (geom_point).
#> Warning: Removed 3 row(s) containing missing values (geom_path).
#> Drew the Profile plot for Q15233 ( 6 of 10 )
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Drew the Profile plot for Q16181 ( 7 of 10 )
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Drew the Profile plot for Q9NSD9 ( 8 of 10 )
#> Warning: Removed 8 rows containing missing values (geom_point).
#> Drew the Profile plot for Q9UGP8 ( 9 of 10 )
#> Warning: Removed 6 rows containing missing values (geom_point).
#> Drew the Profile plot for Q9Y450 ( 10 of 10 )
#> Warning: Removed 16 rows containing missing values (geom_point).
#> Warning: Removed 16 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for P04406 ( 1 of 10 )
#> Warning: Removed 29 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
#> Warning: Removed 29 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for P06576 ( 2 of 10 )
#> Warning: Removed 23 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
#> Warning: Removed 23 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for P12277 ( 3 of 10 )
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for P23919 ( 4 of 10 )
#> Drew the Profile plot with summarization for P31947 ( 5 of 10 )
#> Warning: Removed 52 rows containing missing values (geom_point).
#> Warning: Removed 3 row(s) containing missing values (geom_path).
#> Warning: Removed 52 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for Q15233 ( 6 of 10 )
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for Q16181 ( 7 of 10 )
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for Q9NSD9 ( 8 of 10 )
#> Warning: Removed 8 rows containing missing values (geom_point).
#> Warning: Removed 8 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for Q9UGP8 ( 9 of 10 )
#> Warning: Removed 6 rows containing missing values (geom_point).
#> Warning: Removed 6 rows containing missing values (geom_point).
#> Drew the Profile plot with summarization for Q9Y450 ( 10 of 10 )
# ## Profile plot with all the channels
# quant.msstats.all <- proteinSummarization(input.pd,
# method="msstats",
# normalization=TRUE,
# remove_norm_channel=FALSE,
# remove_empty_channel=FALSE)
#
# dataProcessPlotsTMT(data.peptide = input.pd,
# data.summarization = quant.msstats.all,
# type = 'ProfilePlot',
# width = 21, # adjust the figure width since there are 15 TMT runs.
# height = 7)
## Quality control plot
# dataProcessPlotsTMT(data.peptide=input.pd,
# data.summarization=quant.msstats,
# type='QCPlot',
# width = 21, # adjust the figure width since there are 15 TMT runs.
# height = 7)
Tests for significant changes in protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model.
data
: Name of the output of proteinSummarization function. It should have columns named Protein
, TechRepMixture
, Mixture
, Run
, Channel
, Condition
, BioReplicate
, Abundance
.contrast.matrix
: Comparison between conditions of interests. 1) default is pairwise
, which compare all possible pairs between two conditions. 2) Otherwise, users can specify the comparisons of interest. Based on the levels of conditions, specify 1 or -1 to the conditions of interests and 0 otherwise. The levels of conditions are sorted alphabetically.moderated
: If moderated = TRUE, then moderated t statistic will be calculated; otherwise, ordinary t statistic will be used.adj.method
: adjusted method for multiple comparison. ’BH` is default.remove_norm_channel
: TRUE(default) removes Norm
channels from protein level data.remove_empty_channel
: TRUE(default) removes Empty
channels from protein level data.# test for all the possible pairs of conditions
test.pairwise <- groupComparisonTMT(quant.msstats)
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
head(test.pairwise)
#> Protein Label log2FC SE DF pvalue adj.pvalue issue
#> 1 P04406 0.125-0.5 -0.031373953 0.0214787 102 0.1471712 0.3896576 NA
#> 2 P04406 0.125-0.667 -0.010442843 0.0214787 102 0.6278717 0.8969595 NA
#> 3 P04406 0.125-1 -0.005921016 0.0214787 102 0.7833599 0.9509302 NA
#> 4 P04406 0.5-0.667 0.020931110 0.0214787 102 0.3321112 0.5720904 NA
#> 5 P04406 0.5-1 0.025452937 0.0214787 102 0.2387587 0.5968968 NA
#> 6 P04406 0.667-1 0.004521827 0.0214787 102 0.8336771 0.9324055 NA
# Check the conditions in the protein data
levels(quant.msstats$Condition)
#> [1] "0.125" "0.5" "0.667" "1"
# Only compare condition 0.125 and 1
comparison<-matrix(c(-1,0,0,1),nrow=1)
# Set the names of each row
row.names(comparison)<-"1-0.125"
# Set the column names
colnames(comparison)<- c("0.125", "0.5", "0.667", "1")
comparison
#> 0.125 0.5 0.667 1
#> 1-0.125 -1 0 0 1
test.contrast <- groupComparisonTMT(data = quant.msstats, contrast.matrix = comparison)
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
head(test.contrast)
#> Protein Label log2FC SE DF pvalue adj.pvalue issue
#> 1 P04406 1-0.125 0.005921016 0.02147870 102 0.7833599 0.9509302 NA
#> 2 P06576 1-0.125 -0.001284321 0.02081887 102 0.9509302 0.9509302 NA
#> 3 P12277 1-0.125 -0.013004897 0.02641674 102 0.6235669 0.9142682 NA
#> 4 P23919 1-0.125 0.031852508 0.02484815 102 0.2027886 0.6298913 NA
#> 5 P31947 1-0.125 0.034549433 0.03102666 102 0.2680937 0.6298913 NA
#> 6 Q15233 1-0.125 0.010110290 0.02155178 102 0.6399877 0.9142682 NA