We assume missing values are censored and then impute the missing values. Protein-level summarization from peptide level quantification are performed. After all, global median normalization on peptide level data and normalization between MS runs using reference channels will be implemented.

proteinSummarization(
  data,
  method = "msstats",
  global_norm = TRUE,
  reference_norm = TRUE,
  remove_norm_channel = TRUE,
  remove_empty_channel = TRUE,
  MBimpute = TRUE,
  maxQuantileforCensored = NULL
)

Arguments

data

Name of the output of PDtoMSstatsTMTFormat function or peptide-level quantified data from other tools. It should have columns ProteinName, PeptideSequence, Charge, PSM, Mixture, TechRepMixture, Run, Channel, Condition, BioReplicate, Intensity

method

Four different summarization methods to protein-level can be performed : "msstats"(default), "MedianPolish", "Median", "LogSum".

global_norm

Global median normalization on peptide level data (equalizing the medians across all the channels and MS runs). Default is TRUE. It will be performed before protein-level summarization.

reference_norm

Reference channel based normalization between MS runs on protein level data. TRUE(default) needs at least one reference channel in each MS run, annotated by 'Norm' in Condtion column. It will be performed after protein-level summarization. FALSE will not perform this normalization step. If data only has one run, then reference_norm=FALSE.

remove_norm_channel

TRUE(default) removes 'Norm' channels from protein level data.

remove_empty_channel

TRUE(default) removes 'Empty' channels from protein level data.

MBimpute

only for method="msstats". TRUE (default) imputes missing values by Accelated failure model. FALSE uses minimum value to impute the missing value for each peptide precursor ion.

maxQuantileforCensored

We assume missing values are censored. maxQuantileforCensored is Maximum quantile for deciding censored missing value, for instance, 0.999. Default is Null.

Value

data.frame with protein-level summarization for each run and channel

Examples

data(input.pd) quant.pd.msstats <- proteinSummarization(input.pd, method="msstats", global_norm=TRUE, reference_norm=TRUE)
#> Joining, by = c("Run", "Channel")
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw ( 1 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 4-29 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 1
#> -> [K].sTPSGFTLDDVIQTGVDNPGHPYIMTVGcVAGDEESYEVFk.[D]_4_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_02.raw ( 2 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 3-33 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 1
#> -> [R].eVLGDAVPDEILIEAVLk.[N]_3_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_03.raw ( 3 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 3-29 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 1
#> -> [K].qQQDQVDr.[N]_2_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture2_01.raw ( 4 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 1-28 #> # of Transitions/Peptide 1-1
#> #> ** 1 Proteins have only single transition : Consider excluding this protein from the dataset. (Q9Y450)
#> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 1
#> -> [K].dYEFMWNPHLGYILTcPSNLGTGLr.[A]_3_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture2_02.raw ( 5 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 1-30 #> # of Transitions/Peptide 1-1
#> #> ** 1 Proteins have only single transition : Consider excluding this protein from the dataset. (Q9Y450)
#> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 2
#> -> [K].qQQDQVDr.[N]_2_NA_NA, [R].nLPQYVSNELLEEAFSVFGQVEr.[A]_3_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture2_03.raw ( 6 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 2-30 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_01.raw ( 7 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 4-31 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_02.raw ( 8 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 3-30 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 1
#> -> [K].vDIVAINDPFIDLNYMVYMFQYDSTHGk.[F]_3_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture3_03.raw ( 9 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 5-30 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 3
#> -> [K].vDIVAINDPFIDLNYMVYMFQYDSTHGk.[F]_3_NA_NA, [R].iPSAVGYQPTLATDMGTMQEr.[I]_2_NA_NA, [R].gAMPPAPVPAGTPAPPGPATMMPDGTLGLTPPTTEr.[F]_4_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture4_01.raw ( 10 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 3-31 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 2
#> -> [K].sTPSGFTLDDVIQTGVDNPGHPYIMTVGcVAGDEESYEVFk.[D]_4_NA_NA, [K].qQQDQVDr.[N]_2_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture4_02.raw ( 11 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 3-31 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 1
#> -> [R].fcTGLTQIETLFk.[S]_2_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture4_03.raw ( 12 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 1-31 #> # of Transitions/Peptide 1-1
#> #> ** 1 Proteins have only single transition : Consider excluding this protein from the dataset. (Q9Y450)
#> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture5_01.raw ( 13 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 3-34 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture5_02.raw ( 14 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 2-30 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 3
#> -> [R].mGQMAMGGAmGINNr.[G]_2_NA_NA, [R].nLPQYVSNELLEEAFSVFGQVER.[A]_3_NA_NA, [R].dQNAEQIr.[L]_2_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 161117_SILAC_HeLa_UPS1_TMT10_Mixture5_03.raw ( 15 of 15 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 10 #> # of Peptides/Protein 5-32 #> # of Transitions/Peptide 1-1 #> #> Summary of Samples : #> 0.125 0.5 0.667 1 Norm #> # of MS runs 2 2 2 2 2 #> # of Biological Replicates 1 1 1 1 1 #> # of Technical Replicates 2 2 2 2 2
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 4
#> -> [K].gFQQILAGEYDHLPEQAFYmVGPIEEAVAk.[A]_3_NA_NA, [K].qFAPIHAEAPEFMEMSVEQEILVTGIk.[V]_4_NA_NA, [K].qQQDQVDr.[N]_2_NA_NA, [R].dQNAEQIr.[L]_2_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |======= | 10% | |============== | 20% | |===================== | 30% | |============================ | 40% | |=================================== | 50% | |========================================== | 60% | |================================================= | 70% | |======================================================== | 80% | |=============================================================== | 90% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Normalization between MS runs for Protein : P04406 ( 1 of 10 )
#> Normalization between MS runs for Protein : P06576 ( 2 of 10 )
#> Normalization between MS runs for Protein : P12277 ( 3 of 10 )
#> Normalization between MS runs for Protein : P23919 ( 4 of 10 )
#> Normalization between MS runs for Protein : P31947 ( 5 of 10 )
#> Normalization between MS runs for Protein : Q15233 ( 6 of 10 )
#> Normalization between MS runs for Protein : Q16181 ( 7 of 10 )
#> Normalization between MS runs for Protein : Q9NSD9 ( 8 of 10 )
#> Normalization between MS runs for Protein : Q9UGP8 ( 9 of 10 )
#> Normalization between MS runs for Protein : Q9Y450 ( 10 of 10 )
head(quant.pd.msstats)
#> Run Protein Abundance Channel #> 1 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.59812 127C #> 2 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.55729 129N #> 3 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.71783 128N #> 4 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.67190 129C #> 5 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.51106 127N #> 6 161117_SILAC_HeLa_UPS1_TMT10_Mixture1_01.raw P04406 16.49448 130C #> BioReplicate Condition TechRepMixture Mixture #> 1 Mixture1_0.125 0.125 1 Mixture1 #> 2 Mixture1_0.125 0.125 1 Mixture1 #> 3 Mixture1_0.5 0.5 1 Mixture1 #> 4 Mixture1_0.5 0.5 1 Mixture1 #> 5 Mixture1_0.667 0.667 1 Mixture1 #> 6 Mixture1_0.667 0.667 1 Mixture1