MaxQtoMSstatsFormat.Rd
Import MaxQuant files
MaxQtoMSstatsFormat( evidence, annotation, proteinGroups, proteinID = "Proteins", useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeMpeptides = FALSE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
evidence | name of 'evidence.txt' data, which includes feature-level data. |
---|---|
annotation | name of 'annotation.txt' data which includes Raw.file, Condition, BioReplicate, Run, IsotopeLabelType information. |
proteinGroups | name of 'proteinGroups.txt' data. It needs to matching protein group ID. If proteinGroups=NULL, use 'Proteins' column in 'evidence.txt'. |
proteinID | 'Proteins'(default) or 'Leading.razor.protein' for Protein ID. |
useUniquePeptide | TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
summaryforMultipleRows | max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
removeFewMeasurements | TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeMpeptides | TRUE will remove the peptides including 'M' sequence. FALSE is default. |
removeOxidationMpeptides | TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Peptide | TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. |
use_log_file | logical. If TRUE, information about data processing will be saved to a file. |
append | logical. If TRUE, information about data processing will be added to an existing log file. |
verbose | logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path | character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If `append = TRUE`, has to be a valid path to a file. |
... | additional parameters to `data.table::fread`. |
data.frame in the MSstats required format.
Warning: MSstats does not support for metabolic labeling or iTRAQ experiments.
mq_ev = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert")) mq_pg = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert")) annot = data.table::fread(system.file("tinytest/raw_data/MaxQuant/annotation.csv", package = "MSstatsConvert")) maxq_imported = MaxQtoMSstatsFormat(mq_ev, annot, mq_pg, use_log_file = FALSE)#> INFO [2021-07-05 20:05:31] ** Raw data from MaxQuant imported successfully. #> INFO [2021-07-05 20:05:31] ** Rows with values of Potentialcontaminant equal to + are removed #> INFO [2021-07-05 20:05:31] ** Rows with values of Reverse equal to + are removed #> INFO [2021-07-05 20:05:31] ** Rows with values of Potentialcontaminant equal to + are removed #> INFO [2021-07-05 20:05:31] ** Rows with values of Reverse equal to + are removed #> INFO [2021-07-05 20:05:31] ** Rows with values of Onlyidentifiedbysite equal to + are removed #> INFO [2021-07-05 20:05:31] ** + Contaminant, + Reverse, + Potential.contaminant, + Only.identified.by.site proteins are removed. #> INFO [2021-07-05 20:05:31] ** Raw data from MaxQuant cleaned successfully. #> INFO [2021-07-05 20:05:31] ** Using provided annotation. #> INFO [2021-07-05 20:05:31] ** Run labels were standardized to remove symbols such as '.' or '%'. #> INFO [2021-07-05 20:05:31] ** The following options are used: #> - Features will be defined by the columns: PeptideSequence, PrecursorCharge #> - Shared peptides will be removed. #> - Proteins with single feature will not be removed. #> - Features with less than 3 measurements across runs will be removed. #> INFO [2021-07-05 20:05:31] ** Features with all missing measurements across runs are removed. #> INFO [2021-07-05 20:05:31] ** Shared peptides are removed. #> INFO [2021-07-05 20:05:31] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max #> INFO [2021-07-05 20:05:31] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:31] ** Run annotation merged with quantification data. #> INFO [2021-07-05 20:05:31] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:31] ** Fractionation handled. #> INFO [2021-07-05 20:05:31] ** Updated quantification data to make balanced design. Missing values are marked by NA #> INFO [2021-07-05 20:05:31] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.head(maxq_imported)#> ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge #> 1 P06959 AEAPAAAPAAK 2 NA NA #> 2 P06959 AEAPAAAPAAK 2 NA NA #> 3 P06959 AEAPAAAPAAK 2 NA NA #> 4 P06959 AEAPAAAPAAK 2 NA NA #> 5 P06959 AEAPAAAPAAK 2 NA NA #> 6 P06959 AEAPAAAPAAK 2 NA NA #> IsotopeLabelType Condition BioReplicate #> 1 L 1 1 #> 2 L 1 1 #> 3 L 1 1 #> 4 L 2 2 #> 5 L 2 2 #> 6 L 2 2 #> Run Fraction Intensity #> 1 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1 1 4023100 #> 2 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2 1 5132500 #> 3 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3 1 2761600 #> 4 121219_S_CCES_01_04_LysC_Try_1to10_Mixt_2_1 1 2932900 #> 5 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2 1 4091800 #> 6 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3 1 4727000