Skip to contents

Get robust protein-level summary based on unique and shared peptides

Usage

getWeightedProteinSummary(
  feature_data,
  norm = "p_norm",
  norm_parameter = 1,
  weights_mode = "contributions",
  tolerance = 0.1,
  max_iter = 10,
  initial_summary = "unique",
  weights_penalty = FALSE,
  weights_penalty_param = 0.1,
  save_weights_history = FALSE,
  save_convergence_history = FALSE
)

Arguments

feature_data

data.table in MSstatsTMT format. See also the Details section

norm

"p_norm" or "Huber"

norm_parameter

p for norm=="p_norm", M for norm=="Huber"

weights_mode

"contributions" for "sum to one" and "non-negative" conditions, "probabilities" for only "non-negative" condition.

tolerance

tolerance to indicate weights convergence

max_iter

maximum number of iteration of the procedure

initial_summary

"unique", "flat" or "flat unique"

weights_penalty

if TRUE, weights will be penalized for deviations from equal value for all proteins matching to a given PSM

weights_penalty_param

penalty parameter

save_weights_history

logical, if TRUE, weights from all iterations will be returned

save_convergence_history

logical, if TRUE, all differences between consecutive weights estimator from all iterations will be returned

Value

list of data frames with summary and other information. See the Details section for more information

Details

1. Input format: this function takes as input data in MSstatsTMT format, which is a data frame with columns ProteinName, PeptideSequence, Charge, PSM (equal to PeptideSequence and Charge separated by an underscore), Channel, Intensity, Run and annotation columns: BioReplicate, Condition, Mixture, and TechRepMixture. Additionally, we use two columns: log2IntensityNormalized and Cluster. The first column stores log-transformed normalized intensities (which can be obtained with normalizeSharedPeptides function). If this column is not provided, data will be normalized before summarization. The second column stores information about connected sub-graphs of the peptide-protein graph. This column can be added with addClusterMembership function or omitted. In the second case, this information will be added before summarization.

2. Output format: an S4 object of class "MSstatsWeightedSummary" which consists of the following items:

  • FeatureLevelData:feature-level (input) data

  • ProteinLevelData:protein-level (summarized) output data

  • Weights:a table of final peptide-protein Weights

  • ConvergenceSummary:table with information about convergence for each Cluster and Run

  • WeightsHistory:optional data.table of Weights from all iterations of fitting algorithm

  • ConvergenceHistory:optional data.table with sums of absolute values of differences between Weights from consecutive iteration

Elements of this object can be accessed with functions featureData, proteinData, featurWeights, convergenceSummary, weightsHistory, convergenceHistory

For statistical details about the method, please consult the vignette.