SpectronauttoMSstatsFormat.Rd
Import Spectronaut files
SpectronauttoMSstatsFormat( input, annotation = NULL, intensity = "PeakArea", filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input | name of Spectronaut output, which is long-format. ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity, F.ExcludedFromQuantification are required. Rows with F.ExcludedFromQuantification=True will be removed. |
---|---|
annotation | name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Spectronaut, use annotation=NULL (default). It will use the annotation information from input. |
intensity | 'PeakArea'(default) uses not normalized peak area. 'NormalizedPeakArea' uses peak area normalized by Spectronaut. |
filter_with_Qvalue | TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in EG.Qvalue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose. |
qvalue_cutoff | Cutoff for EG.Qvalue. default is 0.01. |
useUniquePeptide | TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements | TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature | TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
summaryforMultipleRows | max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
use_log_file | logical. If TRUE, information about data processing will be saved to a file. |
append | logical. If TRUE, information about data processing will be added to an existing log file. |
verbose | logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path | character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If `append = TRUE`, has to be a valid path to a file. |
... | additional parameters to `data.table::fread`. |
data.frame in the MSstats required format.
spectronaut_raw = system.file("tinytest/raw_data/Spectronaut/spectronaut_input.csv", package = "MSstatsConvert") spectronaut_raw = data.table::fread(spectronaut_raw) spectronaut_imported = SpectronauttoMSstatsFormat(spectronaut_raw, use_log_file = FALSE)#> INFO [2021-07-05 20:05:32] ** Raw data from Spectronaut imported successfully. #> INFO [2021-07-05 20:05:32] ** Raw data from Spectronaut cleaned successfully. #> INFO [2021-07-05 20:05:32] ** Using annotation extracted from quantification data. #> INFO [2021-07-05 20:05:32] ** Run labels were standardized to remove symbols such as '.' or '%'. #> INFO [2021-07-05 20:05:32] ** The following options are used: #> - Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge #> - Shared peptides will be removed. #> - Proteins with single feature will not be removed. #> - Features with less than 3 measurements across runs will be removed. #> INFO [2021-07-05 20:05:33] ** Intensities with values smaller than 0.01 in PGQvalue are replaced with NA #> INFO [2021-07-05 20:05:33] ** Intensities with values smaller than 0.01 in EGQvalue are replaced with 0 #> INFO [2021-07-05 20:05:33] ** Features with all missing measurements across runs are removed. #> INFO [2021-07-05 20:05:33] ** Shared peptides are removed. #> INFO [2021-07-05 20:05:33] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max #> INFO [2021-07-05 20:05:33] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:33] ** Run annotation merged with quantification data. #> INFO [2021-07-05 20:05:33] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:33] ** Fractionation handled. #> INFO [2021-07-05 20:05:33] ** Updated quantification data to make balanced design. Missing values are marked by NA #> INFO [2021-07-05 20:05:33] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.head(spectronaut_imported)#> ProteinName PeptideSequence PrecursorCharge FragmentIon #> 1 1/sp|O75475|PSIP1_HUMAN _ASNEDVTK_ 2 y4 #> 2 1/sp|O75475|PSIP1_HUMAN _ASNEDVTK_ 2 y4 #> 3 1/sp|O75475|PSIP1_HUMAN _ASNEDVTK_ 2 y4 #> 4 1/sp|O75475|PSIP1_HUMAN _ASNEDVTK_ 2 y4 #> 5 1/sp|O75475|PSIP1_HUMAN _ASNEDVTK_ 2 y4 #> 6 1/sp|O75475|PSIP1_HUMAN _ASNEDVTK_ 2 y4 #> ProductCharge IsotopeLabelType Condition BioReplicate Run #> 1 1 L A 1 lgillet_I150211_008_A #> 2 1 L B 1 lgillet_I150211_009_B #> 3 1 L A 2 lgillet_I150211_010_A #> 4 1 L B 2 lgillet_I150211_011_B #> 5 1 L A 3 lgillet_I150211_012_A #> 6 1 L B 3 lgillet_I150211_013_B #> Fraction Intensity #> 1 1 21.358784 #> 2 1 15.226856 #> 3 1 7.240201 #> 4 1 20.564655 #> 5 1 24.972715 #> 6 1 24.549818