PDtoMSstatsFormat.Rd
Import Proteome Discoverer files
PDtoMSstatsFormat( input, annotation, useNumProteinsColumn = FALSE, useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, which.quantification = "Precursor.Area", which.proteinid = "Protein.Group.Accessions", which.sequence = "Sequence", use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input | PD report or a path to it. |
---|---|
annotation | name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run information. 'Run' will be matched with 'Spectrum.File'. |
useNumProteinsColumn | TRUE removes peptides which have more than 1 in # Proteins column of PD output. |
useUniquePeptide | TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
summaryforMultipleRows | max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
removeFewMeasurements | TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides | TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Peptide | TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. |
which.quantification | Use 'Precursor.Area'(default) column for quantified intensities. 'Intensity' or 'Area' can be used instead. |
which.proteinid | Use 'Protein.Accessions'(default) column for protein name. 'Master.Protein.Accessions' can be used instead. |
which.sequence | Use 'Sequence'(default) column for peptide sequence. 'Annotated.Sequence' can be used instead. |
use_log_file | logical. If TRUE, information about data processing will be saved to a file. |
append | logical. If TRUE, information about data processing will be added to an existing log file. |
verbose | logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path | character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If `append = TRUE`, has to be a valid path to a file. |
... | additional parameters to `data.table::fread`. |
data.frame in the MSstats required format.
pd_raw = system.file("tinytest/raw_data/PD/pd_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_pd.csv", package = "MSstats") pd_raw = data.table::fread(pd_raw) annot = data.table::fread(annot) pd_imported = PDtoMSstatsFormat(pd_raw, annot, use_log_file = FALSE)#> INFO [2021-07-05 20:05:32] ** Raw data from ProteomeDiscoverer imported successfully. #> INFO [2021-07-05 20:05:32] ** Raw data from ProteomeDiscoverer cleaned successfully. #> INFO [2021-07-05 20:05:32] ** Using provided annotation. #> INFO [2021-07-05 20:05:32] ** Run labels were standardized to remove symbols such as '.' or '%'. #> INFO [2021-07-05 20:05:32] ** The following options are used: #> - Features will be defined by the columns: PeptideSequence, PrecursorCharge #> - Shared peptides will be removed. #> - Proteins with single feature will not be removed. #> - Features with less than 3 measurements across runs will be removed. #> INFO [2021-07-05 20:05:32] ** Features with all missing measurements across runs are removed. #> INFO [2021-07-05 20:05:32] ** Shared peptides are removed.#> Warning: brak argumentów w max; zwracanie wartości -Inf#> INFO [2021-07-05 20:05:32] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max #> INFO [2021-07-05 20:05:32] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:32] ** Run annotation merged with quantification data. #> INFO [2021-07-05 20:05:32] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:32] ** Fractionation handled. #> INFO [2021-07-05 20:05:32] ** Updated quantification data to make balanced design. Missing values are marked by NA #> INFO [2021-07-05 20:05:32] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.head(pd_imported)#> ProteinName PeptideModifiedSequence PrecursorCharge FragmentIon ProductCharge #> 1 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 2 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 3 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 4 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 5 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> 6 P0ABU9 ANSHAPEAVVEGASR_ 2 NA NA #> IsotopeLabelType Condition BioReplicate #> 1 L Condition1 1 #> 2 L Condition1 1 #> 3 L Condition1 1 #> 4 L Condition2 2 #> 5 L Condition2 2 #> 6 L Condition2 2 #> Run Fraction Intensity #> 1 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1raw 1 21400000 #> 2 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2raw 1 17500000 #> 3 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3raw 1 NA #> 4 121219_S_CCES_01_04_LysC_Try_1to10_Mixt_2_1raw 1 11600000 #> 5 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2raw 1 12000000 #> 6 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3raw 1 16200000