Converts non-TMT Progenesis output into the format needed for MSstatsPTM

ProgenesistoMSstatsPTMFormat(
  ptm_input,
  annotation,
  global_protein_input = FALSE,
  fasta_path = FALSE,
  useUniquePeptide = TRUE,
  summaryforMultipleRows = max,
  removeFewMeasurements = TRUE,
  removeOxidationMpeptides = FALSE,
  removeProtein_with1Peptide = FALSE,
  mod.num = "Single"
)

Arguments

ptm_input

name of Progenesis output with modified peptides, which is wide-format. 'Accession', Sequence', 'Modification', 'Charge' and one column for each run are required

annotation

name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run, and Type (PTM or Protein) information. It will be matched with the column name of input for MS runs. Please note PTM and global Protein run names are often different, which is why an additional Type column indicating Protein or PTM is required.

global_protein_input

name of Progenesis output with unmodified peptides, which is wide-format. 'Accession', Sequence', 'Modification', 'Charge' and one column for each run are required

fasta_path

string containing path to the corresponding fasta file for the modified peptide dataset.

useUniquePeptide

TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.

summaryforMultipleRows

max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities.

removeFewMeasurements

TRUE (default) will remove the features that have 1 or 2 measurements across runs.

removeOxidationMpeptides

TRUE will remove the modified peptides including 'Oxidation (M)' sequence. FALSE is default.

removeProtein_with1Peptide

TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default.

mod.num

For modified peptide dataset, must be one of Single or Total. The default is Single. The number modifications per peptide to be used. If "Single", only peptides with one modification will be used. Otherwise "Total" includes peptides with more than one modification. Selecting "Total" may confound the effect of different modifications.

Value

a list of two data.tables named 'PTM' and 'PROTEIN' in the format required by MSstatsPTM.

Examples

# Example annotation file annotation <- data.frame('Condition' = c('Control', 'Control', 'Control', 'Treatment', 'Treatment', 'Treatment'), 'BioReplicate' = c(1,2,3,4,5,6), 'Run' = c('prot_run_1', 'prot_run_2', 'prot_run_3', 'phos_run_1', 'phos_run_2', 'phos_run_3'), 'Type' = c("Protein", "Protein", "Protein", "PTM", "PTM", "PTM")) # The output should be in the following format. head(raw.input$PTM)
#> # A tibble: 6 x 10 #> ProteinName PeptideSequence Condition BioReplicate Run Intensity #> <chr> <chr> <chr> <chr> <chr> <dbl> #> 1 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH1 CCCP-B1T1 1423906. #> 2 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH1 CCCP-B1T2 877045. #> 3 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH2 CCCP-B2T1 384418. #> 4 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH2 CCCP-B2T2 454858. #> 5 Q9UHD8_K262 DAGLK*QAPASR Combo BCH1 Combo-B1T1 1603377. #> 6 Q9UHD8_K262 DAGLK*QAPASR Combo BCH1 Combo-B1T2 676555. #> # ... with 4 more variables: PrecursorCharge <chr>, FragmentIon <lgl>, #> # ProductCharge <lgl>, IsotopeLabelType <chr>
head(raw.input$PROTEIN)
#> # A tibble: 6 x 10 #> ProteinName PeptideSequence Condition BioReplicate Run Intensity #> <chr> <chr> <chr> <chr> <chr> <dbl> #> 1 Q9UHD8 STLINTLFK CCCP BCH2 CCCP-B2T1 367944. #> 2 Q9UHD8 STLINTLFK CCCP BCH2 CCCP-B2T2 341207. #> 3 Q9UHD8 STLINTLFK Combo BCH2 Combo-B2T1 185843. #> 4 Q9UHD8 STLINTLFK Ctrl BCH2 Ctrl-B2T1 529224. #> 5 Q9UHD8 STLINTLFK Ctrl BCH2 Ctrl-B2T2 483355. #> 6 Q9UHD8 STLINTLFK USP30_OE BCH2 USP30_OE-B2T1 447795. #> # ... with 4 more variables: PrecursorCharge <chr>, FragmentIon <lgl>, #> # ProductCharge <lgl>, IsotopeLabelType <chr>