emcAdr : Evolutionary Markov Chain for Adverse Drug Reaction

library(emcAdr)
#> Welcome to this Emc algorithm package ! If you notice any unexpected behavior of a method, please bring it to my attention (<jules.bangard@etu.unistra.fr>).
library(gridExtra)
library(ggplot2)

Aim

The general purpose of this package is to detect drug and drug interactions that could be linked to an Adverse Event (AE). The expected data input for the function of this package are the following.

Data

data("ATC_Tree_UpperBound_2024") ## The ATC tree containing the upper bound
head(ATC_Tree_UpperBound_2024)
#>   ATCCode                            Name ATC_length upperBound
#> 1       A alimentary tract and metabolism          1        858
#> 2     A01     stomatological preparations          3         49
#> 3    A01A     stomatological preparations          4         49
#> 4   A01AA      caries prophylactic agents          5         10
#> 5 A01AA01                 sodium fluoride          7          5
#> 6 A01AA02      sodium monofluorophosphate          7          6
data("FAERS_myopathy") ## The Individual Case Safety Reports in the following format
head(FAERS_myopathy) ## First column is "patientATC" containing vector of index of drug intake for patient at row i in the ATC tree.
#>                                                         patientATC patientADR
#> 1                                                             4533      FALSE
#> 2                                                             4329      FALSE
#> 3                                                             4784      FALSE
#> 4 2333, 2997, 4666, 4692, 4724, 4732, 4772, 5092, 5249, 5840, 5841      FALSE
#> 5                         1691, 1806, 1854, 1907, 3003, 3152, 3166      FALSE
#> 6                                                        903, 1417      FALSE
## Second column is "patientADR", value is true if the patient at row i experience the considered AE and false otherwise.

Score distribution estimation

This data enables us to run the MCMC algorithm in order to estimate the score distribution among this dataset of patients.

estimated_distribution_size1_300ksteps_t500 <- DistributionApproximation(10,
                        ATC_Tree_UpperBound_2024, FAERS_myopathy,
                        temperature = 500, nbResults = 200,
                        Smax = 1,num_thread = 8)

Here we estimate the distribution of the score among cocktails of size 1 (Smax=1).

If desired, we can compute the true distribution of score for size 1 and size 2 cocktails. Let’s do it for size one cocktails in order to compare our estimation with the truth.

true_distribution_size1 <- trueDistributionDrugs(ATC_Tree_UpperBound_2024, FAERS_myopathy,
                                beta = 4, num_thread = 8)

Visualize and compare the results

qq_plot_output(estimated_distribution_size1_300ksteps_t500,
               true_distribution_size1, filtered = T)

Here is the quantile-quantile plot to compare the estimation vs the true distribution. We can also plot both distribution

plot_estimated_distribution <- plot_frequency(
    estimated_distribution_size1_300ksteps_t500$Filtered_score_distribution[2:length(estimated_distribution_size1_300ksteps_t500$Filtered_score_distribution)], binwidth = .3, sqrt = F, xlab = "H(C)") + labs(title = "Estimated distribution of risks among size 1 cocktails") + theme(plot.title = element_text(size=20)) + ylim(c(0,0.35))

plot_true_distribution <- plot_frequency(
    true_distribution_size1$Filtered_score_distribution[2:length(true_distribution_size1$Filtered_score_distribution)], binwidth = .3, xlab = "H(C)") + labs(title = "True Distribution of Risks among size 1 cocktails") + theme(plot.title = element_text(size=20)) + ylim(c(0,0.35))

grid.arrange(plot_estimated_distribution, plot_true_distribution ,nrow = 2)
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_bar()`).

Find cocktails with higher score

We can apply the genetic algorithm in order to find riskiest drug cocktails.

genetic_algorithm_results <- GeneticAlgorithm(epochs = 20, nbIndividuals = 100,
                                              ATC_Tree_UpperBound_2024, FAERS_myopathy,
                                              num_thread = 2, diversity = T,
                                              p_mutation = 0.2, alpha = 1.5)
#> epoch : 0 | mean : 0 | best score : -0 | best cocktail : 2404 5265 1521 2107 
#> epoch : 1 | mean : 0 | best score : -0 | best cocktail : 842 1919 
#> epoch : 2 | mean : 0 | best score : -0 | best cocktail : 3497 6031 4567 
#> epoch : 3 | mean : 0.00444657 | best score : 0.444657 | best cocktail : 5159 
#> epoch : 4 | mean : 0.0113691 | best score : 0.378969 | best cocktail : 5159 3021 
#> epoch : 5 | mean : 0.00340012 | best score : 0.340012 | best cocktail : 5159 
#> epoch : 6 | mean : 0.00588999 | best score : 0.294499 | best cocktail : 5159 
#> epoch : 7 | mean : 0.0060689 | best score : 0.303445 | best cocktail : 5159 
#> epoch : 8 | mean : 0.0123507 | best score : 0.308768 | best cocktail : 5159 
#> epoch : 9 | mean : 0.0112511 | best score : 0.281277 | best cocktail : 5159 4834 
#> epoch : 10 | mean : 0.0173001 | best score : 0.288335 | best cocktail : 5159 
#> epoch : 11 | mean : 0.0205583 | best score : 0.256979 | best cocktail : 5159 
#> epoch : 12 | mean : 0.0410071 | best score : 0.215827 | best cocktail : 5159 
#> epoch : 13 | mean : 0.0517082 | best score : 0.142208 | best cocktail : 5159 
#> epoch : 14 | mean : 0.0471221 | best score : 0.0945202 | best cocktail : 5159 
#> epoch : 15 | mean : 0.0583386 | best score : 0.0735699 | best cocktail : 5159 
#> epoch : 16 | mean : 0.0476845 | best score : 0.0762016 | best cocktail : 5168 3393 
#> epoch : 17 | mean : 0.0450584 | best score : 0.0474634 | best cocktail : 5159 
#> epoch : 18 | mean : 0.0434212 | best score : 0.0447641 | best cocktail : 5161 
#> epoch : 19 | mean : 0.0442597 | best score : 0.045163 | best cocktail : 5159

It is now possible to compute the p-value of each found solutions

## We put the estimation of risk distribution of each cocktails size in a list
distribution_list <- list(estimated_distribution_size1_300ksteps_t500)

p_values <- p_value_genetic_results(distribution_list, genetic_algorithm_results)
p_values
#>   [1] NaN Inf NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN NaN NaN Inf NaN NaN NaN
#>  [19] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN Inf
#>  [37] NaN NaN NaN NaN Inf Inf NaN Inf NaN NaN Inf NaN NaN NaN NaN NaN NaN NaN
#>  [55] NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
#>  [73] Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
#>  [91] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN