Converts non-TMT Progenesis output into the format needed for MSstatsPTM

ProgenesistoMSstatsPTMFormat(
  ptm_input,
  annotation,
  global_protein_input = FALSE,
  fasta_path = FALSE,
  useUniquePeptide = TRUE,
  summaryforMultipleRows = max,
  removeFewMeasurements = TRUE,
  removeOxidationMpeptides = FALSE,
  removeProtein_with1Peptide = FALSE,
  mod.num = "Single"
)

Arguments

ptm_input	name of Progenesis output with modified peptides, which is wide-format. 'Accession', Sequence', 'Modification', 'Charge' and one column for each run are required
annotation	name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run, and Type (PTM or Protein) information. It will be matched with the column name of input for MS runs. Please note PTM and global Protein run names are often different, which is why an additional Type column indicating Protein or PTM is required.
global_protein_input	name of Progenesis output with unmodified peptides, which is wide-format. 'Accession', Sequence', 'Modification', 'Charge' and one column for each run are required
fasta_path	string containing path to the corresponding fasta file for the modified peptide dataset.
useUniquePeptide	TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.
summaryforMultipleRows	max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities.
removeFewMeasurements	TRUE (default) will remove the features that have 1 or 2 measurements across runs.
removeOxidationMpeptides	TRUE will remove the modified peptides including 'Oxidation (M)' sequence. FALSE is default.
removeProtein_with1Peptide	TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default.
mod.num	For modified peptide dataset, must be one of `Single` or `Total`. The default is `Single`. The number modifications per peptide to be used. If "Single", only peptides with one modification will be used. Otherwise "Total" includes peptides with more than one modification. Selecting "Total" may confound the effect of different modifications.

Value

a list of two data.tables named 'PTM' and 'PROTEIN' in the format required by MSstatsPTM.

Examples


# Example annotation file
annotation <- data.frame('Condition' = c('Control', 'Control', 'Control',
                         'Treatment', 'Treatment', 'Treatment'),
                         'BioReplicate' = c(1,2,3,4,5,6),
                         'Run' = c('prot_run_1', 'prot_run_2', 'prot_run_3',
                                  'phos_run_1', 'phos_run_2', 'phos_run_3'),
                         'Type' = c("Protein", "Protein", "Protein", "PTM", 
                                    "PTM", "PTM"))
                                    
# The output should be in the following format.
head(raw.input$PTM)
#> # A tibble: 6 x 10
#>   ProteinName PeptideSequence Condition BioReplicate Run        Intensity
#>   <chr>       <chr>           <chr>     <chr>        <chr>          <dbl>
#> 1 Q9UHD8_K262 DAGLK*QAPASR    CCCP      BCH1         CCCP-B1T1   1423906.
#> 2 Q9UHD8_K262 DAGLK*QAPASR    CCCP      BCH1         CCCP-B1T2    877045.
#> 3 Q9UHD8_K262 DAGLK*QAPASR    CCCP      BCH2         CCCP-B2T1    384418.
#> 4 Q9UHD8_K262 DAGLK*QAPASR    CCCP      BCH2         CCCP-B2T2    454858.
#> 5 Q9UHD8_K262 DAGLK*QAPASR    Combo     BCH1         Combo-B1T1  1603377.
#> 6 Q9UHD8_K262 DAGLK*QAPASR    Combo     BCH1         Combo-B1T2   676555.
#> # ... with 4 more variables: PrecursorCharge <chr>, FragmentIon <lgl>,
#> #   ProductCharge <lgl>, IsotopeLabelType <chr>
head(raw.input$PROTEIN)
#> # A tibble: 6 x 10
#>   ProteinName PeptideSequence Condition BioReplicate Run           Intensity
#>   <chr>       <chr>           <chr>     <chr>        <chr>             <dbl>
#> 1 Q9UHD8      STLINTLFK       CCCP      BCH2         CCCP-B2T1       367944.
#> 2 Q9UHD8      STLINTLFK       CCCP      BCH2         CCCP-B2T2       341207.
#> 3 Q9UHD8      STLINTLFK       Combo     BCH2         Combo-B2T1      185843.
#> 4 Q9UHD8      STLINTLFK       Ctrl      BCH2         Ctrl-B2T1       529224.
#> 5 Q9UHD8      STLINTLFK       Ctrl      BCH2         Ctrl-B2T2       483355.
#> 6 Q9UHD8      STLINTLFK       USP30_OE  BCH2         USP30_OE-B2T1   447795.
#> # ... with 4 more variables: PrecursorCharge <chr>, FragmentIon <lgl>,
#> #   ProductCharge <lgl>, IsotopeLabelType <chr>