Feature-level data summarization

MSstatsSummarize(
  proteins_list,
  method,
  impute,
  censored_symbol,
  remove50missing,
  equal_variance
)

Arguments

proteins_list

list of processed feature-level data

method

summarization method: "linear" or "TMP"

impute

only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored

censored_symbol

Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing.

remove50missing

only for summaryMethod = "TMP". TRUE removes the runs which have more than 50% missing values. FALSE is default.

equal_variance

only for summaryMethod = "linear". Default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features.

Value

list of length one with run-level data.

Examples

raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL)
#> INFO [2021-07-05 20:05:29] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:29] ** Fractionation handled. #> INFO [2021-07-05 20:05:29] ** Updated quantification data to make balanced design. Missing values are marked by NA
input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999)
#> INFO [2021-07-05 20:05:30] ** Log2 intensities under cutoff = 13.456 were considered as censored missing values. #> INFO [2021-07-05 20:05:30] ** Log2 intensities = NA were considered as censored missing values.
input = MSstatsSelectFeatures(input, "all")
#> INFO [2021-07-05 20:05:30] ** Use all features that the dataset originally has.
processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, TRUE)
#> | | | 0% | |============ | 17% | |======================= | 33% | |=================================== | 50% | |=============================================== | 67% | |========================================================== | 83% | |======================================================================| 100%
length(summarized) # list of summarization outputs for each protein
#> [1] 6
head(summarized[[1]][[1]]) # run-level summary
#> RUN LogIntensities Protein #> 1: 1 21.28437 bovine #> 2: 2 20.85653 bovine #> 3: 3 20.67521 bovine #> 4: 4 21.60443 bovine #> 5: 5 21.82186 bovine #> 6: 6 21.20445 bovine