The scores are sampled from a two-component mixture distribution of Gaussians.
sample_dataset(H1_n = 160, H0_n = 40, decoy_n = H0_n, decoy_large_n = 2000, H0_mean = 2.75, H1_mean = 3.31, H0_sd = 0.13, H1_sd = 0.28, decoy_mean = H0_mean, decoy_sd = H0_sd, decoy_large_mean = H0_mean, decoy_large_sd = H0_sd)
H1_n | number of correct subset target PSMs. |
---|---|
H0_n | number of incorrect subset target PSMs. |
decoy_n | number of subset decoy PSMs. |
decoy_large_n | number of non subset decoy PSMs. |
H0_mean | mean of the incorrect subset target PSM distribution. |
H1_mean | mean of the correct subset target PSM distribution. |
H0_sd | sd of the incorrect subset target PSM distribution. |
H1_sd | sd of the correct subset target PSM distribution. |
decoy_mean | mean of the subset decoy PSM distribution. |
decoy_sd | sd of the subset decoy PSM distribution. |
decoy_large_mean | mean of the non subset decoy PSM distribution. |
decoy_large_sd | sd of the non subset decoy PSM distribution. |
dataframe with 4 columns:
score assigned to the peptide to spectrum match (PSM).
TRUE if decoy PSM, FALSE otherwise.
TRUE if incorrect target PSM, FALSE otherwise.
TRUE if PSM belongs to the subset in interest, FALSE or otherwise.
## Simulate a dataset with 140 correct target subset PSMs, 60 incorrect target subset PSMS, ##60 decoy subset PSMs and 2000 additional decoy PSMs. ## In this first example, incorrect subset target and decoy PSMs have a similar distribution set.seed(10) d = sample_dataset(H1_n = 140,H0_n = 60, decoy_n = 60 ,decoy_large_n = 2000, H0_mean = 2.7, H1_mean = 3, decoy_mean = 2.7, decoy_large_mean = 2.7) # visualize the decoy distribution and the correct and incorrect target distribution plot(density(d$score[d$decoy]), xlim = c(2,4), xlab = 'score', ylab = 'density')lines(density(d$score[!d$decoy & d$H0]), col = 'red')lines(density(d$score[!d$decoy & !d$H0]), col = 'green')legend('topright', c('decoys', 'incorrect targets', 'correct targets'), text.col = c('black', 'red', 'green'))## In this first example, incorrect subset target and decoy PSMs have not a similar distribution ## because the additional set of decoy PSMs are not representative for the incorrect subset PSMs. set.seed(10) d = sample_dataset(H1_n = 140,H0_n = 60, decoy_n = 60 ,decoy_large_n = 2000, H0_mean = 2.7, H1_mean = 3, decoy_mean = 2.7, decoy_large_mean = 2.6) # visualize the decoy distribution and the correct and incorrect target distribution plot(density(d$score[d$decoy]), xlim = c(2,4), xlab = 'score', ylab = 'density')lines(density(d$score[!d$decoy & d$H0]), col = 'red')lines(density(d$score[!d$decoy & !d$H0]), col = 'green')legend('topright', c('decoys', 'incorrect targets', 'correct targets'), text.col = c('black', 'red', 'green'))