Calculate sample size for future experiments of a Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment based on intensity-based linear model. Two options of the calculation: (1) number of biological replicates per condition, (2) power.

designSampleSize(
  data,
  desiredFC,
  FDR = 0.05,
  numSample = TRUE,
  power = 0.9,
  use_log_file = TRUE,
  append = FALSE,
  verbose = TRUE,
  log_file_path = NULL
)

Arguments

data

'FittedModel' in testing output from function groupComparison.

desiredFC

the range of a desired fold change which includes the lower and upper values of the desired fold change.

FDR

a pre-specified false discovery ratio (FDR) to control the overall false positive rate. Default is 0.05

numSample

minimal number of biological replicates per condition. TRUE represents you require to calculate the sample size for this category, else you should input the exact number of biological replicates.

power

a pre-specified statistical power which defined as the probability of detecting a true fold change. TRUE represent you require to calculate the power for this category, else you should input the average of power you expect. Default is 0.9

use_log_file

logical. If TRUE, information about data processing will be saved to a file.

append

logical. If TRUE, information about data processing will be added to an existing log file.

verbose

logical. If TRUE, information about data processing wil be printed to the console.

log_file_path

character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If `append = TRUE`, has to be a valid path to a file.

Value

data.frame - sample size calculation results including varibles: desiredFC, numSample, FDR, and power.

Details

The function fits the model and uses variance components to calculate sample size. The underlying model fitting with intensity-based linear model with technical MS run replication. Estimated sample size is rounded to 0 decimal. The function can only obtain either one of the categories of the sample size calculation (numSample, numPep, numTran, power) at the same time.

Examples

# Consider quantitative data (i.e. QuantData) from yeast study. # A time course study with ten time points of interests and three biological replicates. QuantData <- dataProcess(SRMRawData)
#> INFO [2021-07-05 20:05:40] ** Features with one or two measurements across runs are removed. #> INFO [2021-07-05 20:05:40] ** Fractionation handled. #> INFO [2021-07-05 20:05:40] ** Updated quantification data to make balanced design. Missing values are marked by NA #> INFO [2021-07-05 20:05:40] ** Log2 intensities under cutoff = 3.776 were considered as censored missing values. #> INFO [2021-07-05 20:05:40] ** Log2 intensities = NA were considered as censored missing values. #> INFO [2021-07-05 20:05:40] ** Use all features that the dataset originally has. #> INFO [2021-07-05 20:05:40] #> # proteins: 2 #> # peptides per protein: 2-2 #> # features per peptide: 3-3 #> INFO [2021-07-05 20:05:40] #> 1 2 3 4 5 6 7 8 9 10 #> # runs 3 3 3 3 3 3 3 3 3 3 #> # bioreplicates 3 3 3 3 3 3 3 3 3 3 #> # tech. replicates 1 1 1 1 1 1 1 1 1 1 #> INFO [2021-07-05 20:05:40] == Start the summarization per subplot... #> | | | 0% | |=================================== | 50% | |======================================================================| 100% #> INFO [2021-07-05 20:05:40] == Summarization is done.
head(QuantData$FeatureLevelData)
#> PROTEIN PEPTIDE TRANSITION FEATURE LABEL GROUP RUN #> 1 IDHC ATDVIVPEEGELR_2 y7_NA ATDVIVPEEGELR_2_y7_NA H 0 1 #> 2 IDHC ATDVIVPEEGELR_2 y7_NA ATDVIVPEEGELR_2_y7_NA L 1 1 #> 3 IDHC ATDVIVPEEGELR_2 y7_NA ATDVIVPEEGELR_2_y7_NA H 0 2 #> 4 IDHC ATDVIVPEEGELR_2 y7_NA ATDVIVPEEGELR_2_y7_NA L 1 2 #> 5 IDHC ATDVIVPEEGELR_2 y7_NA ATDVIVPEEGELR_2_y7_NA H 0 3 #> 6 IDHC ATDVIVPEEGELR_2 y7_NA ATDVIVPEEGELR_2_y7_NA L 1 3 #> SUBJECT FRACTION originalRUN censored INTENSITY ABUNDANCE newABUNDANCE #> 1 0 1 1 FALSE 84361.0835 15.855859 15.855859 #> 2 1 1 1 FALSE 215.1353 7.240669 7.240669 #> 3 0 1 2 FALSE 62109.5876 15.801179 15.801179 #> 4 2 1 2 FALSE 1205.2252 10.113738 10.113738 #> 5 0 1 3 FALSE 65114.3646 15.755022 15.755022 #> 6 3 1 3 FALSE 1476.3046 10.292109 10.292109 #> predicted #> 1 NA #> 2 NA #> 3 NA #> 4 NA #> 5 NA #> 6 NA
## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") colnames(comparison)<-unique(QuantData$ProteinLevelData$GROUP) testResultMultiComparisons<-groupComparison(contrast.matrix=comparison,data=QuantData)
#> INFO [2021-07-05 20:05:40] == Start to test and get inference in whole plot ... #> | | | 0% | |=================================== | 50% | |======================================================================| 100% #> INFO [2021-07-05 20:05:41] == Comparisons for all proteins are done.
## Calculate sample size for future experiments: #(1) Minimal number of biological replicates per condition designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=TRUE, desiredFC=c(1.25,1.75), FDR=0.05, power=0.8)
#> desiredFC numSample FDR power CV #> 1 1.250 31 0.05 0.8 0.004 #> 2 1.275 26 0.05 0.8 0.005 #> 3 1.300 23 0.05 0.8 0.006 #> 4 1.325 20 0.05 0.8 0.006 #> 5 1.350 17 0.05 0.8 0.007 #> 6 1.375 15 0.05 0.8 0.008 #> 7 1.400 14 0.05 0.8 0.009 #> 8 1.425 12 0.05 0.8 0.010 #> 9 1.450 11 0.05 0.8 0.011 #> 10 1.475 10 0.05 0.8 0.012 #> 11 1.500 9 0.05 0.8 0.013 #> 12 1.525 9 0.05 0.8 0.012 #> 13 1.550 8 0.05 0.8 0.014 #> 14 1.575 8 0.05 0.8 0.013 #> 15 1.600 7 0.05 0.8 0.015 #> 16 1.625 7 0.05 0.8 0.015 #> 17 1.650 6 0.05 0.8 0.017 #> 18 1.675 6 0.05 0.8 0.017 #> 19 1.700 6 0.05 0.8 0.017 #> 20 1.725 5 0.05 0.8 0.020 #> 21 1.750 5 0.05 0.8 0.019
#(2) Power calculation designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=2, desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE)
#> desiredFC numSample FDR power CV #> 1 1.250 2 0.05 0.01 0.068 #> 2 1.275 2 0.05 0.01 0.067 #> 3 1.300 2 0.05 0.01 0.065 #> 4 1.325 2 0.05 0.01 0.064 #> 5 1.350 2 0.05 0.01 0.063 #> 6 1.375 2 0.05 0.01 0.062 #> 7 1.400 2 0.05 0.01 0.061 #> 8 1.425 2 0.05 0.01 0.060 #> 9 1.450 2 0.05 0.01 0.059 #> 10 1.475 2 0.05 0.01 0.058 #> 11 1.500 2 0.05 0.01 0.057 #> 12 1.525 2 0.05 0.01 0.056 #> 13 1.550 2 0.05 0.01 0.055 #> 14 1.575 2 0.05 0.01 0.054 #> 15 1.600 2 0.05 0.02 0.053 #> 16 1.625 2 0.05 0.03 0.052 #> 17 1.650 2 0.05 0.04 0.051 #> 18 1.675 2 0.05 0.05 0.051 #> 19 1.700 2 0.05 0.06 0.050 #> 20 1.725 2 0.05 0.08 0.049 #> 21 1.750 2 0.05 0.10 0.048