Takes as input TMT experiments from MaxQ and converts the data into the format needed for MSstatsPTM. Requires only the modified file from MaxQ (for example Phospho(STY)Sites) and an annotation file for PTM data. To adjust modified peptides for changes in global protein level, unmodified TMT experimental data must also be returned.

MaxQtoMSstatsPTMFormat(
  sites.data,
  annotation,
  evidence = NULL,
  proteinGroups = NULL,
  mod.num = "Single",
  keyword = "phos",
  which.proteinid.ptm = "Protein",
  which.proteinid.protein = "Leading.razor.protein",
  removeMpeptides = FALSE
)

Arguments

sites.data

modified peptide output from MaxQuant. For example, a phosphorylation experiment would require the Phospho(STY)Sites.txt file

annotation

data frame which contains column Run, Fraction, TechRepMixture, Mixture, Channel, BioReplicate, Condition.

evidence

for global protein dataset. name of 'evidence.txt' data, which includes feature-level data.

proteinGroups

for global protein dataset, name of 'proteinGroups.txt' data.

mod.num

For modified peptide dataset. The number modifications per peptide to be used. If "Single", only peptides with one modification will be used. Otherwise "Total" can be selected which does not cap the number of modifications per peptide. "Single" is the default. Selecting "Total" may confound the effect of different modifications.

keyword

the sub-name of columns in the sites.data file. For phosphorylation data, this value should be "phos". The default is "phos".

which.proteinid.ptm

For PTM dataset, which column to use for protein name. Use 'Proteins'(default) column for protein name. 'Leading.proteins' or 'Leading.razor.protein' or 'Gene.names' can be used instead to get the protein ID with single protein. However, those can potentially have the shared peptides.

which.proteinid.protein

For Protein dataset, which column to use for protein name. Same options as above.

removeMpeptides

If Oxidation (M) modifications should be removed. Default is TRUE.

Value

a list of two data.tables named 'PTM' and 'PROTEIN' in the format required by MSstatsPTM.

Examples

head(raw.input.tmt$PTM)
#> ProteinName PeptideSequence Charge PSM Mixture TechRepMixture #> 1 Protein_12_S703 Peptide491 3 Peptide_491_3 1 1 #> 2 Protein_12_S703 Peptide491 3 Peptide_491_3 1 1 #> 3 Protein_12_S703 Peptide491 3 Peptide_491_3 1 1 #> 4 Protein_12_S703 Peptide491 3 Peptide_491_3 1 1 #> 5 Protein_12_S703 Peptide491 3 Peptide_491_3 1 1 #> 6 Protein_12_S703 Peptide491 3 Peptide_491_3 1 1 #> Run Channel Condition BioReplicate Intensity #> 1 1_1 128N Condition_2 Condition_2_1 48030.0 #> 2 1_1 129C Condition_4 Condition_4_2 100224.4 #> 3 1_1 131C Condition_3 Condition_3_2 66804.6 #> 4 1_1 130N Condition_1 Condition_1_2 46779.8 #> 5 1_1 128C Condition_6 Condition_6_1 77497.9 #> 6 1_1 126C Condition_4 Condition_4_1 81559.7
head(raw.input.tmt$PROTEIN)
#> ProteinName PeptideSequence Charge PSM Mixture TechRepMixture Run #> 1 Protein_12 Peptide9121 3 Peptide_9121_3 1 1 1_1 #> 2 Protein_12 Peptide27963 5 Peptide_27963_5 1 1 1_1 #> 3 Protein_12 Peptide28482 4 Peptide_28482_4 1 1 1_1 #> 4 Protein_12 Peptide10940 2 Peptide_10940_2 2 1 2_1 #> 5 Protein_12 Peptide4900 2 Peptide_4900_2 2 1 2_1 #> 6 Protein_12 Peptide4900 3 Peptide_4900_3 2 1 2_1 #> Channel Condition BioReplicate Intensity #> 1 126C Condition_4 Condition_4_1 10996116.9 #> 2 127C Condition_5 Condition_5_1 56965.1 #> 3 131N Condition_2 Condition_2_2 286121.7 #> 4 131N Condition_2 Condition_2_4 534806.0 #> 5 126C Condition_4 Condition_4_3 1134908.7 #> 6 126C Condition_4 Condition_4_3 1605773.2