R/ProgenesistoMSstatsPTMFormat.R
ProgenesistoMSstatsPTMFormat.Rd
Converts non-TMT Progenesis output into the format needed for MSstatsPTM
ProgenesistoMSstatsPTMFormat( ptm_input, annotation, global_protein_input = FALSE, fasta_path = FALSE, useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, mod.num = "Single" )
ptm_input | name of Progenesis output with modified peptides, which is wide-format. 'Accession', Sequence', 'Modification', 'Charge' and one column for each run are required |
---|---|
annotation | name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run, and Type (PTM or Protein) information. It will be matched with the column name of input for MS runs. Please note PTM and global Protein run names are often different, which is why an additional Type column indicating Protein or PTM is required. |
global_protein_input | name of Progenesis output with unmodified peptides, which is wide-format. 'Accession', Sequence', 'Modification', 'Charge' and one column for each run are required |
fasta_path | string containing path to the corresponding fasta file for the modified peptide dataset. |
useUniquePeptide | TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
summaryforMultipleRows | max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
removeFewMeasurements | TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides | TRUE will remove the modified peptides including 'Oxidation (M)' sequence. FALSE is default. |
removeProtein_with1Peptide | TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. |
mod.num | For modified peptide dataset, must be one of |
a list of two data.tables named 'PTM' and 'PROTEIN' in the format required by MSstatsPTM.
# Example annotation file annotation <- data.frame('Condition' = c('Control', 'Control', 'Control', 'Treatment', 'Treatment', 'Treatment'), 'BioReplicate' = c(1,2,3,4,5,6), 'Run' = c('prot_run_1', 'prot_run_2', 'prot_run_3', 'phos_run_1', 'phos_run_2', 'phos_run_3'), 'Type' = c("Protein", "Protein", "Protein", "PTM", "PTM", "PTM")) # The output should be in the following format. head(raw.input$PTM)#> # A tibble: 6 x 10 #> ProteinName PeptideSequence Condition BioReplicate Run Intensity #> <chr> <chr> <chr> <chr> <chr> <dbl> #> 1 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH1 CCCP-B1T1 1423906. #> 2 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH1 CCCP-B1T2 877045. #> 3 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH2 CCCP-B2T1 384418. #> 4 Q9UHD8_K262 DAGLK*QAPASR CCCP BCH2 CCCP-B2T2 454858. #> 5 Q9UHD8_K262 DAGLK*QAPASR Combo BCH1 Combo-B1T1 1603377. #> 6 Q9UHD8_K262 DAGLK*QAPASR Combo BCH1 Combo-B1T2 676555. #> # ... with 4 more variables: PrecursorCharge <chr>, FragmentIon <lgl>, #> # ProductCharge <lgl>, IsotopeLabelType <chr>#> # A tibble: 6 x 10 #> ProteinName PeptideSequence Condition BioReplicate Run Intensity #> <chr> <chr> <chr> <chr> <chr> <dbl> #> 1 Q9UHD8 STLINTLFK CCCP BCH2 CCCP-B2T1 367944. #> 2 Q9UHD8 STLINTLFK CCCP BCH2 CCCP-B2T2 341207. #> 3 Q9UHD8 STLINTLFK Combo BCH2 Combo-B2T1 185843. #> 4 Q9UHD8 STLINTLFK Ctrl BCH2 Ctrl-B2T1 529224. #> 5 Q9UHD8 STLINTLFK Ctrl BCH2 Ctrl-B2T2 483355. #> 6 Q9UHD8 STLINTLFK USP30_OE BCH2 USP30_OE-B2T1 447795. #> # ... with 4 more variables: PrecursorCharge <chr>, FragmentIon <lgl>, #> # ProductCharge <lgl>, IsotopeLabelType <chr>