Import Proteome Discoverer files

PDtoMSstatsFormat(
  input,
  annotation,
  useNumProteinsColumn = FALSE,
  useUniquePeptide = TRUE,
  summaryforMultipleRows = max,
  removeFewMeasurements = TRUE,
  removeOxidationMpeptides = FALSE,
  removeProtein_with1Peptide = FALSE,
  which.quantification = "Precursor.Area",
  which.proteinid = "Protein.Group.Accessions",
  which.sequence = "Sequence",
  use_log_file = TRUE,
  append = FALSE,
  verbose = TRUE,
  log_file_path = NULL,
  ...
)

Arguments

input

PD report or a path to it.

annotation

name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run information. 'Run' will be matched with 'Spectrum.File'.

useNumProteinsColumn

TRUE removes peptides which have more than 1 in # Proteins column of PD output.

useUniquePeptide

TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.

summaryforMultipleRows

max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities.

removeFewMeasurements

TRUE (default) will remove the features that have 1 or 2 measurements across runs.

removeOxidationMpeptides

TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default.

removeProtein_with1Peptide

TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default.

which.quantification

Use 'Precursor.Area'(default) column for quantified intensities. 'Intensity' or 'Area' can be used instead.

which.proteinid

Use 'Protein.Accessions'(default) column for protein name. 'Master.Protein.Accessions' can be used instead.

which.sequence

Use 'Sequence'(default) column for peptide sequence. 'Annotated.Sequence' can be used instead.

use_log_file

logical. If TRUE, information about data processing will be saved to a file.

append

logical. If TRUE, information about data processing will be added to an existing log file.

verbose

logical. If TRUE, information about data processing wil be printed to the console.

log_file_path

character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If `append = TRUE`, has to be a valid path to a file.

...

additional parameters to `data.table::fread`.

Value

data.frame in the MSstats required format.

Examples

pd_raw = system.file("tinytest/raw_data/PD/pd_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_pd.csv", package = "MSstats") pd_raw = data.table::fread(pd_raw) annot = data.table::fread(annot) pd_imported = PDtoMSstatsFormat(pd_raw, annot, use_log_file = FALSE)
#> INFO [2021-07-05 20:05:32] ** Raw data from ProteomeDiscoverer imported successfully. #> INFO [2021-07-05 20:05:32] ** Raw data from ProteomeDiscoverer cleaned successfully. #> INFO [2021-07-05 20:05:32] ** Using provided annotation. #> INFO [2021-07-05 20:05:32] ** Run labels were standardized to remove symbols such as '.' or '%'. #> INFO [2021-07-05 20:05:32] ** The following options are used: #> - Features will be defined by the columns: PeptideSequence, PrecursorCharge #> - Shared peptides will be removed. #> - Proteins with single feature will not be removed. #> - Features with less than 3 measurements across runs will be removed. #> INFO [2021-07-05 20:05:32] ** Features with all missing measurements across runs are removed. #> INFO [2021-07-05 20:05:32] ** Shared peptides are removed.
#> Warning: brak argumentów w max; zwracanie wartości -Inf
#> INFO [2021-07-05 20:05:32] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max #> INFO [2021-07-05 20:05:32] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:32] ** Run annotation merged with quantification data. #> INFO [2021-07-05 20:05:32] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:32] ** Fractionation handled. #> INFO [2021-07-05 20:05:32] ** Updated quantification data to make balanced design. Missing values are marked by NA #> INFO [2021-07-05 20:05:32] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
head(pd_imported)
#> ProteinName PeptideModifiedSequence PrecursorCharge FragmentIon ProductCharge #> 1 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 2 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 3 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 4 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 5 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 6 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> IsotopeLabelType Condition BioReplicate #> 1 L Condition1 1 #> 2 L Condition1 1 #> 3 L Condition1 1 #> 4 L Condition2 2 #> 5 L Condition2 2 #> 6 L Condition2 2 #> Run Fraction Intensity #> 1 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1raw 1 21400000 #> 2 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2raw 1 17500000 #> 3 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3raw 1 NA #> 4 121219_S_CCES_01_04_LysC_Try_1to10_Mixt_2_1raw 1 11600000 #> 5 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2raw 1 12000000 #> 6 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3raw 1 16200000