library(emcAdr)
#> Welcome to this Emc algorithm package ! If you notice any unexpected behavior of a method, please bring it to my attention (<jules.bangard@etu.unistra.fr>).
library(gridExtra)
library(ggplot2)
The general purpose of this package is to detect drug and drug interactions that could be linked to an Adverse Event (AE). The expected data input for the function of this package are the following.
data("ATC_Tree_UpperBound_2024") ## The ATC tree containing the upper bound
head(ATC_Tree_UpperBound_2024)
#> ATCCode Name ATC_length upperBound
#> 1 A alimentary tract and metabolism 1 858
#> 2 A01 stomatological preparations 3 49
#> 3 A01A stomatological preparations 4 49
#> 4 A01AA caries prophylactic agents 5 10
#> 5 A01AA01 sodium fluoride 7 5
#> 6 A01AA02 sodium monofluorophosphate 7 6
data("FAERS_myopathy") ## The Individual Case Safety Reports in the following format
head(FAERS_myopathy) ## First column is "patientATC" containing vector of index of drug intake for patient at row i in the ATC tree.
#> patientATC patientADR
#> 1 4533 FALSE
#> 2 4329 FALSE
#> 3 4784 FALSE
#> 4 2333, 2997, 4666, 4692, 4724, 4732, 4772, 5092, 5249, 5840, 5841 FALSE
#> 5 1691, 1806, 1854, 1907, 3003, 3152, 3166 FALSE
#> 6 903, 1417 FALSE
## Second column is "patientADR", value is true if the patient at row i experience the considered AE and false otherwise.
This data enables us to run the MCMC algorithm in order to estimate the score distribution among this dataset of patients.
estimated_distribution_size1_300ksteps_t500 <- DistributionApproximation(10,
ATC_Tree_UpperBound_2024, FAERS_myopathy,
temperature = 500, nbResults = 200,
Smax = 1,num_thread = 8)
Here we estimate the distribution of the score among cocktails of
size 1 (Smax=1)
.
If desired, we can compute the true distribution of score for size 1 and size 2 cocktails. Let’s do it for size one cocktails in order to compare our estimation with the truth.
Here is the quantile-quantile plot to compare the estimation vs the true distribution. We can also plot both distribution
plot_estimated_distribution <- plot_frequency(
estimated_distribution_size1_300ksteps_t500$Filtered_score_distribution[2:length(estimated_distribution_size1_300ksteps_t500$Filtered_score_distribution)], binwidth = .3, sqrt = F, xlab = "H(C)") + labs(title = "Estimated distribution of risks among size 1 cocktails") + theme(plot.title = element_text(size=20)) + ylim(c(0,0.35))
plot_true_distribution <- plot_frequency(
true_distribution_size1$Filtered_score_distribution[2:length(true_distribution_size1$Filtered_score_distribution)], binwidth = .3, xlab = "H(C)") + labs(title = "True Distribution of Risks among size 1 cocktails") + theme(plot.title = element_text(size=20)) + ylim(c(0,0.35))
grid.arrange(plot_estimated_distribution, plot_true_distribution ,nrow = 2)
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_bar()`).
We can apply the genetic algorithm in order to find riskiest drug cocktails.
genetic_algorithm_results <- GeneticAlgorithm(epochs = 20, nbIndividuals = 100,
ATC_Tree_UpperBound_2024, FAERS_myopathy,
num_thread = 2, diversity = T,
p_mutation = 0.2, alpha = 1.5)
#> epoch : 0 | mean : 0 | best score : -0 | best cocktail : 2404 5265 1521 2107
#> epoch : 1 | mean : 0 | best score : -0 | best cocktail : 842 1919
#> epoch : 2 | mean : 0 | best score : -0 | best cocktail : 3497 6031 4567
#> epoch : 3 | mean : 0.00444657 | best score : 0.444657 | best cocktail : 5159
#> epoch : 4 | mean : 0.0113691 | best score : 0.378969 | best cocktail : 5159 3021
#> epoch : 5 | mean : 0.00340012 | best score : 0.340012 | best cocktail : 5159
#> epoch : 6 | mean : 0.00588999 | best score : 0.294499 | best cocktail : 5159
#> epoch : 7 | mean : 0.0060689 | best score : 0.303445 | best cocktail : 5159
#> epoch : 8 | mean : 0.0123507 | best score : 0.308768 | best cocktail : 5159
#> epoch : 9 | mean : 0.0112511 | best score : 0.281277 | best cocktail : 5159 4834
#> epoch : 10 | mean : 0.0173001 | best score : 0.288335 | best cocktail : 5159
#> epoch : 11 | mean : 0.0205583 | best score : 0.256979 | best cocktail : 5159
#> epoch : 12 | mean : 0.0410071 | best score : 0.215827 | best cocktail : 5159
#> epoch : 13 | mean : 0.0517082 | best score : 0.142208 | best cocktail : 5159
#> epoch : 14 | mean : 0.0471221 | best score : 0.0945202 | best cocktail : 5159
#> epoch : 15 | mean : 0.0583386 | best score : 0.0735699 | best cocktail : 5159
#> epoch : 16 | mean : 0.0476845 | best score : 0.0762016 | best cocktail : 5168 3393
#> epoch : 17 | mean : 0.0450584 | best score : 0.0474634 | best cocktail : 5159
#> epoch : 18 | mean : 0.0434212 | best score : 0.0447641 | best cocktail : 5161
#> epoch : 19 | mean : 0.0442597 | best score : 0.045163 | best cocktail : 5159
It is now possible to compute the p-value of each found solutions
## We put the estimation of risk distribution of each cocktails size in a list
distribution_list <- list(estimated_distribution_size1_300ksteps_t500)
p_values <- p_value_genetic_results(distribution_list, genetic_algorithm_results)
p_values
#> [1] NaN Inf NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN NaN NaN Inf NaN NaN NaN
#> [19] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN Inf
#> [37] NaN NaN NaN NaN Inf Inf NaN Inf NaN NaN Inf NaN NaN NaN NaN NaN NaN NaN
#> [55] NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
#> [73] Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
#> [91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN