Visualization for explanatory data analysis

To illustrate the quantitative data after data-preprocessing and quality control of MS runs, dataProcessPlots takes the quantitative data from function (dataProcess) as input and automatically generate three types of figures in pdf files as output : (1) profile plot (specify "ProfilePlot" in option type), to identify the potential sources of variation for each protein; (2) quality control plot (specify "QCPlot" in option type), to evaluate the systematic bias between MS runs; (3) mean plot for conditions (specify "ConditionPlot" in option type), to illustrate mean and variability of each condition per protein.

dataProcessPlots(
  data,
  type,
  featureName = "Transition",
  ylimUp = FALSE,
  ylimDown = FALSE,
  scale = FALSE,
  interval = "CI",
  x.axis.size = 10,
  y.axis.size = 10,
  text.size = 4,
  text.angle = 0,
  legend.size = 7,
  dot.size.profile = 2,
  dot.size.condition = 3,
  width = 10,
  height = 10,
  which.Protein = "all",
  originalPlot = TRUE,
  summaryPlot = TRUE,
  save_condition_plot_result = FALSE,
  remove_uninformative_feature_outlier = FALSE,
  address = ""
)

Arguments

data	name of the (output of dataProcess function) data set.
type	choice of visualization. "ProfilePlot" represents profile plot of log intensities across MS runs. "QCPlot" represents quality control plot of log intensities across MS runs. "ConditionPlot" represents mean plot of log ratios (Light/Heavy) across conditions.
featureName	for "ProfilePlot" only, "Transition" (default) means printing feature legend in transition-level; "Peptide" means printing feature legend in peptide-level; "NA" means no feature legend printing.
ylimUp	upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot use the upper limit as rounded off maximum of log2(intensities) after normalization + 3. FALSE(Default) for Condition Plot is maximum of log ratio + SD or CI.
ylimDown	lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot is 0. FALSE(Default) for Condition Plot is minumum of log ratio - SD or CI.
scale	for "ConditionPlot" only, FALSE(default) means each conditional level is not scaled at x-axis according to its actual value (equal space at x-axis). TRUE means each conditional level is scaled at x-axis according to its actual value (unequal space at x-axis).
interval	for "ConditionPlot" only, "CI"(default) uses confidence interval with 0.95 significant level for the width of error bar. "SD" uses standard deviation for the width of error bar.
x.axis.size	size of x-axis labeling for "Run" in Profile Plot and QC Plot, and "Condition" in Condition Plot. Default is 10.
y.axis.size	size of y-axis labels. Default is 10.
text.size	size of labels represented each condition at the top of graph in Profile Plot and QC plot. Default is 4.
text.angle	angle of labels represented each condition at the top of graph in Profile Plot and QC plot or x-axis labeling in Condition plot. Default is 0.
legend.size	size of feature legend (transition-level or peptide-level) above graph in Profile Plot. Default is 7.
dot.size.profile	size of dots in profile plot. Default is 2.
dot.size.condition	size of dots in condition plot. Default is 3.
width	width of the saved file. Default is 10.
height	height of the saved file. Default is 10.
which.Protein	Protein list to draw plots. List can be names of Proteins or order numbers of Proteins from levels(data$FeatureLevelData$PROTEIN). Default is "all", which generates all plots for each protein. For QC plot, "allonly" will generate one QC plot with all proteins.
originalPlot	TRUE(default) draws original profile plots.
summaryPlot	TRUE(default) draws profile plots with summarization for run levels.
save_condition_plot_result	TRUE saves the table with values using condition plots. Default is FALSE.
remove_uninformative_feature_outlier	It only works after users used featureSubset="highQuality" in dataProcess. TRUE allows to remove 1) the features are flagged in the column, feature_quality="Uninformative" which are features with bad quality, 2) outliers that are flagged in the column, is_outlier=TRUE in Profile plots. FALSE (default) shows all features and intensities in profile plots.
address	the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "ProfilePlot.pdf" or "QCplot.pdf" or "ConditionPlot.pdf" or "ConditionPlot_value.csv". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window.

Details

Profile Plot : identify the potential sources of variation of each protein. QuantData$FeatureLevelData is used for plots. X-axis is run. Y-axis is log-intensities of transitions. Reference/endogenous signals are in the left/right panel. Line colors indicate peptides and line types indicate transitions. In summarization plots, gray dots and lines are the same as original profile plots with QuantData$FeatureLevelData. Dark dots and lines are for summarized intensities from QuantData$ProteinLevelData.
QC Plot : illustrate the systematic bias between MS runs. After normalization, the reference signals for all proteins should be stable across MS runs. QuantData$FeatureLevelData is used for plots. X-axis is run. Y-axis is log-intensities of transition. Reference/endogenous signals are in the left/right panel. The pdf file contains (1) QC plot for all proteins and (2) QC plots for each protein separately.
Condition Plot : illustrate the systematic difference between conditions. Summarized intensnties from QuantData$ProteinLevelData are used for plots. X-axis is condition. Y-axis is summarized log transformed intensity. If scale is TRUE, the levels of conditions is scaled according to its actual values at x-axis. Red points indicate the mean for each condition. If interval is "CI", blue error bars indicate the confidence interval with 0.95 significant level for each condition. If interval is "SD", blue error bars indicate the standard deviation for each condition.The interval is not related with model-based analysis.

The input of this function is the quantitative data from function dataProcess.

Examples

# Consider quantitative data (i.e. QuantData) from a yeast study with ten time points of interests, 
# three biological replicates, and no technical replicates which is a time-course experiment. 
# The goal is to provide pre-analysis visualization by automatically generate two types of figures 
# in two separate pdf files. 
# Protein IDHC (gene name IDP2) is differentially expressed in time point 1 and time point 7, 
# whereas, Protein PMG2 (gene name GPM2) is not.

QuantData<-dataProcess(SRMRawData, use_log_file = FALSE)
#> INFO  [2021-07-05 20:05:34] ** Features with one or two measurements across runs are removed.
#> INFO  [2021-07-05 20:05:34] ** Fractionation handled.
#> INFO  [2021-07-05 20:05:34] ** Updated quantification data to make balanced design. Missing values are marked by NA
#> INFO  [2021-07-05 20:05:34] ** Log2 intensities under cutoff = 3.776  were considered as censored missing values.
#> INFO  [2021-07-05 20:05:34] ** Log2 intensities = NA were considered as censored missing values.
#> INFO  [2021-07-05 20:05:34] ** Use all features that the dataset originally has.
#> INFO  [2021-07-05 20:05:34] 
#>  # proteins: 2
#>  # peptides per protein: 2-2
#>  # features per peptide: 3-3
#> INFO  [2021-07-05 20:05:34] 
#>                     1 2 3 4 5 6 7 8 9 10
#>              # runs 3 3 3 3 3 3 3 3 3  3
#>     # bioreplicates 3 3 3 3 3 3 3 3 3  3
#>  # tech. replicates 1 1 1 1 1 1 1 1 1  1
#> INFO  [2021-07-05 20:05:34]  == Start the summarization per subplot...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> INFO  [2021-07-05 20:05:34]  == Summarization is done.
head(QuantData$FeatureLevelData)
#>   PROTEIN         PEPTIDE TRANSITION               FEATURE LABEL GROUP RUN
#> 1    IDHC ATDVIVPEEGELR_2      y7_NA ATDVIVPEEGELR_2_y7_NA     H     0   1
#> 2    IDHC ATDVIVPEEGELR_2      y7_NA ATDVIVPEEGELR_2_y7_NA     L     1   1
#> 3    IDHC ATDVIVPEEGELR_2      y7_NA ATDVIVPEEGELR_2_y7_NA     H     0   2
#> 4    IDHC ATDVIVPEEGELR_2      y7_NA ATDVIVPEEGELR_2_y7_NA     L     1   2
#> 5    IDHC ATDVIVPEEGELR_2      y7_NA ATDVIVPEEGELR_2_y7_NA     H     0   3
#> 6    IDHC ATDVIVPEEGELR_2      y7_NA ATDVIVPEEGELR_2_y7_NA     L     1   3
#>   SUBJECT FRACTION originalRUN censored  INTENSITY ABUNDANCE newABUNDANCE
#> 1       0        1           1    FALSE 84361.0835 15.855859    15.855859
#> 2       1        1           1    FALSE   215.1353  7.240669     7.240669
#> 3       0        1           2    FALSE 62109.5876 15.801179    15.801179
#> 4       2        1           2    FALSE  1205.2252 10.113738    10.113738
#> 5       0        1           3    FALSE 65114.3646 15.755022    15.755022
#> 6       3        1           3    FALSE  1476.3046 10.292109    10.292109
#>   predicted
#> 1        NA
#> 2        NA
#> 3        NA
#> 4        NA
#> 5        NA
#> 6        NA
# Profile plot
dataProcessPlots(data=QuantData,type="ProfilePlot")
#> 
  |                                                                            
  |                                                                      |   0%
#> 
  |                                                                            
  |===================================                                   |  50%
#> 
  |                                                                            
  |======================================================================| 100%
#> 
  |                                                                            
  |                                                                      |   0%
#> 
  |                                                                            
  |===================================                                   |  50%
#> 
  |                                                                            
  |======================================================================| 100%
# Quality control plot 
dataProcessPlots(data=QuantData,type="QCPlot")
#> 
  |                                                                            
  |                                                                      |   0%
#> 
  |                                                                            
  |===================================                                   |  50%
#> 
  |                                                                            
  |======================================================================| 100%
# Quantification plot for conditions
dataProcessPlots(data=QuantData,type="ConditionPlot")
#> 
  |                                                                            
  |                                                                      |   0%
#> Warning: Removed 1 rows containing missing values (geom_hline).
#> 
  |                                                                            
  |===================================                                   |  50%
#> Warning: Removed 1 rows containing missing values (geom_hline).
#> 
  |                                                                            
  |======================================================================| 100%

Arguments

Details

Examples

Contents