Probabilistic Risk Assessment – The Keystone for the Future of Toxicology

Safety sciences must cope with uncertainty of models and results as well as information gaps. Acknowledging this uncertainty necessitates embracing probabilities and accepting the remaining risk. Every toxicological tool delivers only probable results. Traditionally, this is taken into account by using uncertainty / assessment factors and worst-case / precautionary approaches and thresholds. Probabilistic methods and Bayesian approaches seek to characterize these uncertainties and promise to support better risk assessment and, thereby, improve risk management decisions. Actual assessments of uncertainty can be more realistic than worst-case scenarios and may allow less conservative safety margins. Most importantly, as soon as we agree on uncertainty, this defines room for improvement and allows a transition from traditional to new approach methods as an engineering exercise. The objective nature of these mathematical tools allows to assign each methodology its fair place in evidence integration, whether in the context of risk assessment, systematic reviews, or in the definition of an integrated testing strategy (ITS) / defined approach (DA) / integrated approach to testing and assessment (IATA). This article gives an overview of methods for probabilistic risk assessment and their application for exposure assessment, physiologically-based kinetic modelling, probability of hazard assessment (based on quantitative and read-across based structure-activity relationships, and mechanistic alerts from in vitro studies), individual susceptibility assessment, and evidence integration. Additional aspects are opportunities for uncertainty analysis of adverse outcome pathways and their relation to thresholds of toxicological concern. In conclusion, probabilistic risk assessment will be key for constructing a new toxicology paradigm – probably!


Introduction
Nothing is as certain as death and taxes 1 . Toxicology (as all of medicine) does not reach this level of certainty, as the Johns Hopkins scholar William Osler (1849-1919) rightly stated, "Medicine is a science of uncertainty and an art of probability", and in this sense toxicology is a very medical discipline. However, our expectation as to the outcome of safety sciences is certainty -a product coming to the market must be safe. This article aims to make the case that we are actually working with an astonishing level of uncertainty in our assessments, which we hide by using apparently deterministic expressions of results (classifications, labels, thresholds, etc.). It is not that we cannot know, but that our predictions have only a certain probability of being correct -not very comforting when the safety of sometimes millions of patients and consumers is at stake.
The 2017 book The Illusion of Risk Control -What Does it Take to Live with Uncertainty? edited by Gilles Motet and Corinne Bieder, makes the important point of acknowledging that there is always a risk and that we can only assess and manage its probability. Consequently, safety is defined by the absence of unacceptable risk, not as the absence of all risk. Giving up on the illusion of safety and acknowledging uncertainty does give a new perspective on risk assessment and management as we will discuss here, applying it to toxicology. Dupuy (1982) described the problem as "The fundamental incapacity of Industrial Man to control his destiny increasingly appears as the paradoxical and tragic result of a desire for total control -either by reason or by force". As we will see, embracing uncertainty can free us to adopt a new toxicity testing paradigm.
Uncertainty and probability are two sides of the same coin. Risk assessment under uncertainty, therefore, logically leads us to probabilistic risk assessment (ProbRA). We will go light on mathematics here. This article is primarily about why to use ProbRA and not on how to do it. In recent years, the importance of having a firm understanding of probability has become apparent, and as a result there are several books the reader can consult, which we recommend: 2 Some defining characteristics of (un)certainty versus probability versus risk 2.1 Uncertainty "We know accurately only when we know little; with knowledge, doubt increases" (Johann Wolfgang von Goethe in Maxims and Reflections).
Uncertainty in toxicology is at its base the lack of knowledge of the true value of a quantity or relationships among quantities. Figure 1 illustrates the path from ignorance approximating certainty with some irreducible uncertainty remaining. Walker et al. (2003) note that uncertainty is not simply the absence of knowledge, but a situation of inadequate information (inexactness, unreliability, and sometimes ignorance). "However, uncertainty can prevail in situations where a lot of information is available …. Furthermore, new information can either decrease or increase uncertainty. New knowledge on complex processes may reveal the presence of uncertainties that were previously unknown or were understated. In this way, more knowledge illuminates that our understanding is more limited or that the processes are more complex than thought before". Cullen and Frey (1999) address uncertainties that arise during risk analyses:

1.
Scenario uncertainty -typically of omission, resulting from incorrect or incomplete specification of the risk scenario to be evaluated. In toxicology, for example, risk assessment before the actual use of a substance is clear.

2.
Model uncertainty -limitations in the mathematical models or techniques often due to (a) simplifying assumptions; (b) exclusion of relevant processes; (c) misspecification of model boundary conditions (e.g., the range of input parameters); or (d) misapplication of a model developed for other purposes. In toxicology, this obviously resonates with many aspects of the risk assessment process.

3.
Input or parameter uncertainty -particular attention must be paid to measurement error, which can be either systemic (when there is a bias in the data) or random (noise in the data). Toxicology obviously faces both, but these are rarely explicitly addressed when risk assessments are made.
Today, additional aspects such as inconsistency, bias, and methodological choices are considered as sources of uncertainty. Recent European Food Safety Authority (EFSA) guidance (EFSA, 2018) details uncertainty very comprehensively for the safety sciences.
The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) working group has issued a guideline (Brozek et al., 2021) on assessing the certainty in modelled evidence, which includes the three types of uncertainty mentioned above and provides a flowchart for finding, selecting, and assessing certainty in a model. The certainty of modelled outputs is recommended to be assessed on the following domains:

1.
Risk of bias a. credibility of the model itself of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain 'error rates,' without consideration of information from outside the experiment. This statistical approach, the key components of which are P values and hypothesis tests, is widely perceived as a mathematically coherent approach to inference." The articles discuss the resulting "p value fallacy". P value fallacy in easy terms means "while most physicians and many biomedical researchers think that a 'P' of 0.05 for a clinical trial means that there is only a 5% chance that the null hypothesis is true, that is not the case. Here is what 'P = 0.05' actually means: if many similar trials are performed testing the same novel hypothesis, and if the null hypothesis is true, then it (the null) will be falsely rejected in 5% of those trials. For any single trial, it doesn't tell us much". Ioannidis (2008) shows the problem for a large number of observational epidemiological studies. Seeing the comparatively high standard of statistics in clinical trials and epidemiology, we are for larger parts of science reminded of Nassim Taleb (2007), "They only knew enough math to be blinded by it".
It should be noted that an understanding of probability developed only slowly in science; Pierre-Simon Laplace classically defined the probability of an event as the number of outcomes favorable to the event divided by the total number of possible outcomes. So, the probability of throwing a six with a perfect die is 1 in 6. Laplace finalized the classical probability theory in the 19 th century, which started as early as the 16 th century (especially Pierre de Fermat and Blaise Pascal in the 17 th century) mainly from the analysis of games. Jacob Bernoulli expanded to the principle of indifference, taking into account that not all outcomes need to have the same probability, and others expanded it to continuous variables. In 1933, the Russian mathematician A. Kolmogorov (1903A. Kolmogorov ( -1987 outlined an axiomatic approach that forms the basis for the modern theory defining probability based on the three suggested axioms. In the 20 th century, frequentist statistics was developed and became the dominant statistical paradigm. It continues to be most popular in scientific articles (with p-values, confidence intervals, etc. ). Frequentist statistics is about repeatability and gathering more data, and probability is the long-run frequency of repeatable experiments.
An alternative approach is "Bayesian inference" based on Bayes' theorem, named after Thomas Bayes, an English statistician of the 18 th century. Here, probability essentially represents the degree of belief in something, probably closer to most people's intuitive idea of probability.
We can thus distinguish three major forms of probability:

1.
The classical or axiomatic (based on Kolmogorov's axioms) probability

2.
The experimental / empirical probability of an event is equal to the long-term frequency of the event's occurrence when the same process is repeated many times (also termed frequentist statistics or frequentist inference)

3.
Subjective probability as the degree of belief or logical support (updated using Bayes' theorem) One drawback of the frequentist approach that is addressed by Bayesian inference is the issue of false-positives, especially for rare events (Szucs and Ioannidis, 2017). We have repeatedly stressed this problem for toxicology, where most hazards occur at low frequencies (Hoffmann and Hartung, 2005). The other way around, "big data" is bringing the reverse challenge of overpowered studies, i.e., "massive data sets expand the number of analyses that can be performed, and the multiplicity of possible analyses combines with lenient P value thresholds like 0.05 to generate vast potential for false positives" (Ioannidis, 2019). Another drawback is that frequentists neglect that opinion plays a major role in both preclinical and clinical research; Bayesian statistics forces the contribution of opinion out into the open where it belongs.

Likelihood
The distinction between probability and likelihood, a.k.a. reverse probability, is fundamentally important 3 : "Probability attaches to possible results; likelihood attaches to hypotheses." This brings us to Bayesian statistics, which consider our beliefs. "Hypotheses, unlike results, are neither mutually exclusive nor exhaustive. … In data analysis, the 'hypotheses' are most often a possible value or a range of possible values for the mean of a distribution. … The set of hypotheses to which we attach likelihoods is limited by our capacity to dream them up. In practice, we can rarely be confident that we have imagined all the possible hypotheses. Our concern is to estimate the extent to which the experimental results affect the relative likelihood of the hypotheses we and others currently entertain. Because we generally do not entertain the full set of alternative hypotheses and because some are nested within others, the likelihoods that we attach to our hypotheses do not have any meaning in and of themselves; only the relative likelihoods -that is, the ratios of two likelihoods -have meaning. … This ratio, the relative likelihood ratio, is called the 'Bayes Factor '." 3 In toxicology, our hypothesis is usually not articulated, but fundamentally we assume that a substance is toxic or, alternatively, that it is non-toxic. This set of hypotheses is neither complete nor mutually exclusive: The substance could be beneficial or toxic for some people or under certain circumstances. Results, on the contrary, refer to the outcome of a specific experiment where associated probabilities are adequate.

Risk
Risk has in the context of toxicology first to be distinguished from hazard, which is not always easy, as many languages do not make this distinction. Hazard is a source of danger, e.g., a tiger, but it becomes a risk only with exposure, i.e., a possibility of loss or injury with a certain probability. The tiger in the cage is a hazard with negligible risk.
Risk is characterized by two quantities:

1.
the magnitude (severity) of the possible adverse consequence(s), and

2.
the likelihood (probability) of occurrence of each consequence. Table 1 gives examples of risks with the different combinations of these two properties. Kaplan and Garrick (1981) defined risk in the context of toxicology as "risk is probability and consequences". So, it is about the severity of possible damage or, as former U.S.
Environmental Protection Agency (EPA) Administrator William K. Reilly phrased it, "Risk is a common metric that lets us distinguish the environmental heart attacks and broken bones from indigestion or bruises" 4 . For toxicology, risk is typically defined for an individual or a population. The consequences (hazards) are typically quite clear, but we struggle with the probabilities. Taleb (2007) phrased it outside of toxicology, "We generally take risks not out of bravado but out of ignorance and blindness to probability!"

The lack of certainty in toxicology
For the reader of this series of articles, this argument is a common thread. Some favorites in brief: In Hartung (2013, Tab. 1) we list 25 reasons why animal models as the most common approach do not reflect humans and cite studies that 20% of drug candidates fail because of unpredicted toxicities, and after passing clinical trials ~8% are withdrawn from the market mostly because of unexpected side-effects. Major studies by consortia of the pharmaceutical industry showed that rodents predict 43% of side effects in humans (n = 150) (Olson et al., 2000) and for all species had a sensitivity of 48% and specificity of 84% (n = 182) (Monticello et al., 2017).
Animal tests cannot be more relevant for humans than they are reproducible for themselveswe showed that of 670 eye corrosive chemicals, a repeat study showed 70% to be corrosive, 20% to be mild, and 10% to have no effect (Luechtefeld et al., 2016a). For skin sensitization, the reproducibility of the guinea pig maximization test was 93% (n = 624) and of the local lymph node assay (LLNA) in mice 89% (n = 296) (Luechtefeld, 2016b). Others reported for the cancer bioassay 57% reproducibility (n = 121) (cited in Basketter et al., 2012 andSmirnova et al., 2018). In our largest analysis (Luechtefeld et al., 2018b), we showed for the six most used Organisation for Economic Co-operation and Development (OECD) guideline tests and 3,469 cases where a chemical was tested more than twice, an average sensitivity of 69% (accuracy 81%); this means that the toxic property is missed in one of three tests.
Obviously, we usually do not know how well animal studies predict human health effects. However, interspecies comparisons cited in the papers above and in Wang and Gray (2015) allow an estimate, as there is no reason to assume that any species predicts humans better than they predict each other. It is important to realize that failure to be realistic about uncertainty in toxicology has significant consequences: When a chemical is declared "safe" only to be determined years later to result in unexpected toxicity, this increases public skepticism about the ability of science to protect people (Maertens et al., 2021).
These reproducibility problems matter especially for the low-frequency events we study (Hoffmann and Hartung, 2005). The problem of rare events of big impact has been elegantly covered by Nassim Taleb (2007) in his popular book The Black Swan -The Impact of the Highly Improbable. Some pertinent quotes 5 were cited earlier in this series (Bottini and Hartung, 2009). A few others are sprinkled into this article. Furthermore, the reader is referred to Taleb's earlier book (2004) on randomness, where many of the same ideas are formulated in a less populistic way. With respect to certainty of our (animal) tools in toxicology, the most appropriate quote from Taleb (2007) is, "In the absence of a feedback process you look at models and think that they confirm reality".
Recently, the Evidence-based Toxicology Collaboration (EBTC 6 ) has tried a new approach to assessing certainty by evaluating rare toxicological events of drug-induced liver injury (DILI), which are poorly predicted by the mandated regulatory test battery. EBTC has put together a multi-stakeholder working group, which has searched for published evidence of DILI effects of drugs with DILI and no-DILI. The approach demonstrated that mechanistic tests reported in the U.S. EPA ToxCast database, and not the mandated regulatory animal tests, predicted rare DILI in humans (Dirven et al., 2021). This evidence-based approach has potential for broader application in toxicological methods validation.
The probabilistic approach is the most widely used method of uncertainty analysis used in mathematical models. ProbRA has emerged as an increasingly popular analysis tool, especially to evaluate risks associated with every aspect of a complex engineering project (e.g., facility, spacecraft, or nuclear power plant) from concept definition, through design, construction, and operation, to end of service and decommissioning. It has its origin in the aerospace industry before and during the Apollo space program. ProbRA is a systematic and comprehensive methodology, which has only rarely been applied to substance safety assessments. ProbRA usually answers three basic questions as summarized by Michael Stamatelatos, NASA Office of Safety and Mission Assurance 7 :

1.
"What can go wrong with the studied technological entity, or what are the initiators or initiating events (undesirable starting events) that lead to adverse consequence(s)?

2.
What and how severe are the potential detriments, or the adverse consequences that the technological entity may be eventually subjected to as a result of the occurrence of the initiator?

3.
How likely to occur are these undesirable consequences, or what are their probabilities or frequencies?" Quite obviously, these can be applied to toxicology, where the initiator is exposure, and the adverse / undesirable consequences are hazard manifestations. For the purpose of this article, question 3 is obviously key. However, we will include some thoughts below on applying an uncertainty concept to adverse outcome pathways (AOP), which can be seen as the toxicological mechanistic aspects of questions 1 & 2. Stamatelatos 7 further suggests the methodologies listed in Table 2 to answer the three questions above.
The modern Monte Carlo method / simulation was developed in the late 1940s by Stanislaw Ulam and John von Neumann in the nuclear weapons projects at the Los Alamos National Laboratory. It is based on the law of large numbers that a random variable can be approximated by taking the empirical mean of independent samples of the variable, where the input parameters are selected according to their respective probability distributions. This repeated random sampling to obtain numerical results uses randomness to solve problems that might be deterministic in principle. This way, it propagates variability or uncertainty of model input parameters and overcomes the uncertainty or variability in the underlying processes. For each combination of input parameters, the deterministic model is then solved, and model results are collected until the specified number of model iterations (shots) is completed. This results in a distribution of the output parameters, which is often parametrized using a Markov chain Monte Carlo (MCMC) sampler.
The Monte Carlo method, however, is just one of many methods for analyzing uncertainty propagation, where the goal is to determine how random variation, lack of knowledge, or error affects the sensitivity, performance, or reliability of the system that is being modeled. An alternative probabilistic methodology is the first-and second-order reliability method (FORM/SORM), a.k.a. Hasofer-Lind reliability index, a semi-probabilistic reliability analysis method devised to evaluate the reliability of a system. It estimates the sensitivity of the failure probability with respect to different input parameters. The method was suggested for ProbRA (Zhang, 2010).
Among the typically applied statistical techniques are (non-) parametric bootstrap methods. A parametric method assumes an underlying model (e.g., lognormal distribution); a nonparametric method only depends on the data points themselves. The term "bootstrap" is suggested to refer to the saying "to pull oneself up by one's bootstraps" as a metaphor for bettering oneself by one's own unaided efforts. As a statistical method, it belongs to the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates (Efron and Tibshirani, 1993;Davison and Hinkley, 1997). A great advantage of bootstrap is that it makes it easy to derive estimates of variability (standard errors) and confidence intervals for estimators of the distribution, such as percentile points, proportions, odds ratios, and correlation coefficients.
Similarly, maximum likelihood estimation 9 can characterize uncertainty estimates at low sample sizes by estimating the parameters of an assumed probability distribution (Rossi, 2018). Alternatives are least squares regression or the generalized method of moments. Advantages and disadvantages of maximum likelihood estimation are 10 : 9 https://towardsdatascience.com/a-gentle-introduction-to-maximum-likelihood-estimation-9fbff27ea12f 10 https://www.aptech.com/blog/beginners-guide-to-maximum-likelihood-estimation-in-gauss/ + If the model is correctly assumed, the maximum likelihood estimator is the most efficient estimator. Efficiency is one measure of the quality of an estimator. An efficient estimator is one that has a small variance or mean squared error. + It provides a consistent but flexible approach that makes it suitable for a wide variety of applications, including cases where assumptions of other models are violated.
+ It results in unbiased estimates in larger samples.
-It relies on the assumption of a model and the derivation of the likelihood function, which is not always easy.
-Like other optimization problems, maximum likelihood estimation can be sensitive to the choice of starting values.
-Depending on the complexity of the likelihood function, the numerical estimation can be computationally expensive 11 .
-Estimates can be biased in small samples.
The Bayesian network (BN) 12 , 13 , also called Bayes network, belief network, belief net, decision net or causal network, introduced by Judea Pearl (1988), is a graphical formalism for representing joint probability distributions. Based on the fundamental work on the representation of and reasoning with probabilistic independence originated by the British statistician A. Philip Dawid in the 1970s, BN aim to model conditional dependence and, therefore causation, by representing conditional dependence by edges in a directed graph. Through these relationships, inference on the random variables in the graph is conducted by using weighing factors. Nodes represent variables (e.g., observable quantities, latent variables, unknown parameters or hypotheses). BN offer an intuitive and efficient way of representing sizable domains, making modeling of complex systems practical. BN provide a convenient and coherent way to represent uncertainty in models. BN have changed the way we think about probabilities.
These different mathematical tools have been employed to carry out probabilistic approaches in risk assessment. In 2014, the EPA published Probabilistic Risk Assessment Methods and Case Studies (EPA, 2014) 14 , describing ProbRA as "analytical methodology used to incorporate information regarding uncertainty and/or variability into analyses to provide insight regarding the degree of certainty of a risk estimate and how the risk estimate varies among different members of an exposed population, including sensitive populations or lifestages" applicable to both human health and ecological risk assessment. Two National Academy of Science reports influenced the report, namely, the National Research Council (NRC)'s report Science There are several comprehensive guides on how to actually do ProbRA (Jensen, 2002;Vose, 2008;Modarres, 2008;Vesely, 2011;Ostrom and Wilhelmsen, 2012). For our arguments, it suffices to say that in ProbRA at least one variable in the risk equation is defined as a probability distribution rather than a single number. However, the vision put forward is that more and more aspects of the risk equation should be seen as probability distributions that can be combined to estimate risk to an individual or, cumulatively, to a population. This is equally applicable to human health risk assessment and to the environment. The big questions are: • Is the method sufficiently advanced for the different aspects of the chemical risk assessment context?
• What are the advantages and challenges?
• What does it take to make them acceptable for regulators and bring them to broader use?
Different stakeholders have embraced this new approach to different extents. EPA and EFSA are clearly at the forefront. EPA already in 1997 (!) started defining what makes ProbRA approaches acceptable to them (Box 1).

Software for ProbRA
Several free and commercial software packages are available for ProbRA (Tab. 3).

Freely available
US EPA has compiled a sizable list of freely available modeling tools for ProbRA, such as RIVM's ConsExpo and MCRA, ILSI's CARES, and EPA's PROcEED, to name a few. The complete list, descriptions, and links to models can be found on US EPA ExpoBox Website 28 . RIVM's MCRA model is a comprehensive probabilistic risk tool, while ConsExpo, DEEMS-FCID/Calendex, CARES, and SHEDS are probabilistic exposure modeling tools for various exposure scenarios (e.g., consumer products to dietary and residential exposures) (Young et al., 2012).
Probabilistic Reverse dOsimetry Estimating Exposure Distribution (PROcEED), developed by the US EPA, is used to perform probabilistic reverse dosimetry calculations. In essence, PROcEED estimates a probability distribution of exposure concentrations that would likely have produced the observed biomarker concentrations measured in a given population, using either a discretized Bayesian approach, or, when an exposure-biomarker relation is linear, a more straightforward exposure conversion factor approach.
iRisk is a web-based tool created by the FDA that assesses risk associated with microbial and chemical contaminants in food using a probabilistic approach. Users enter data for the various factors, such as food, hazard, dose-response, etc. to generate a prediction. Further, the model can evaluate the effectiveness of prevention and control measures; the results are presented as a population-based estimate of health burden.
28 https://www.epa.gov/expobox/exposure-assessment-tools-tiers-and-types-deterministic-and-probabilistic-assessments mc2d is an R package for two-dimensional (or second-order) Monte-Carlo simulations to superimpose the uncertainty in the risk estimates stemming from parameter uncertainty 29 .
In order to reflect the natural variability of a modeled risk, a Monte-Carlo simulation approach can model both the empirical distribution of the risk within the population and of distributions reflecting the variability of parameters across the population.

Commercial
Although not exhaustive, we outline some of the commercially available tools for ProbRA here. Agena Risk (Fenton and Neil, 2014) is a commercial software for Bayesian artificial intelligence (A. I.) and probabilistic reasoning for assessing risk and uncertainty in fields such as operational risk, actuarial analysis, intelligence analysis risk, systems safety and reliability, health risk, cyber-security risk, and strategic financial planning.
Oracle's Crystal Ball and Palisade's @Risk are commercially available applications used in spreadsheet-based tools to report and measure risk using Monte Carlo analysis. Advantages to these applications include multiple pre-defined distributions and the ability to use custom data distributions, which improves risk estimates. The user can also carry out a sensitivity analysis to identify the most impactful metrics. Paini et al. (2017) summarized a number of PBK modeling software packages (Tab. 4), noting that "the field as a whole has suffered from a fragmented software ecosystem, and the recent discontinuation of a widely used modelling software product (acslX) has highlighted the need for software tool resilience. Maintenance of, and access to, corporate knowledge and legacy work conducted with discontinued commercial software is highly problematic. The availability of a robust, free to use, global community-supported application should offer such resilience and help increase confidence in mathematical modelling approaches required by the regulatory community".

Probability of exposure
The concept that exposure has a certain probability for an individual and cumulatively for the population is intuitive and broadly used (Bogen et al., 2009). Cullen and Frey (1999) wrote a textbook, Probabilistic Techniques in Exposure Assessment, on the concept. The German Federal Institute for Risk Assessment (BfR) lauds probabilistic exposure assessment 40 : "Exposure assessment can help to determine the type, nature, frequency and intensity of contacts between the population and the contaminant that is to be assessed. Traditional exposure assessment (also called deterministic estimate or point estimate, 'worst case estimates') of risks from chemical substances estimates a value that ensures protection for most of the population. Deviations from the real values are tolerated in order to ensure protection of the consumer using simple methods by, in some cases, considerably overestimating actual exposure.
For some time now the use of probabilistic approaches (also called distribution-based or population-related approaches) has been under discussion for exposure assessment. These methods do not merely describe a single, normally extreme case but rather endeavour to depict overall variability in the data and, by extension, to present all possible forms of exposure. The mathematical tools used in this approach are Monte Carlo simulations, distribution adjustments and other principles taken from the probability theory.
In toxicology risks are normally described by establishing limit values. Below a limit value there should be no risk; above a limit value health effects through contact with the chemicals cannot be ruled out. This approach is frequently challenged. The question has been raised whether this approach does justice to transparent, realistic risk assessment. Probabilistic methods could highlight this supposed lack of clarity, help to characterise uncertainties and take them into account in risk assessment." Exposure assessments are complex and have clearly limited throughput. They can typically target only a few substances, and individual exposures over time are highly diverse.
Depending on the agent studied, either peak exposures or cumulative amounts are relevant. Metabolism of the chemical and interindividual differences add to the complexity. Noteworthy, approaches for rapid exposure assessment exist, such as US EPA's ExpoCast project 41 , which allow triaging chemicals of irrelevant exposure (Wambaugh et al., 2015). Probabilistic approaches are again critical components here.
With the rise of biomonitoring studies, internal exposures, especially blood and tissue levels of chemicals, are increasingly becoming available. These depend on exposure and bioavailability (and other biokinetic properties to be discussed next). They offer opportunities to focus on relevant exposures. The concept has been broadened to exposomics (Sillé et al., 2020), which often employs probabilistic analyses for our context here. We have stressed earlier in this series and elsewhere the importance of pharmacokinetic modeling for modern toxicology (Basketter et al., 2012;Leist et al., 2014;Tsaioun et al., 2016;Hartung, 2017aHartung, , 2018a. Pharmacokinetic modeling plays a critical role in informing us whether a given dose of a chemical reaches a critical level at the target organ and, in reverse, what in vitro active concentrations correspond to as exposure needed, i.e., quantitative in-vitro-to-in-vivo-extrapolation (QIVIVE) (McNally et al., 2018).
Here, the most important message in the context of ProbRA is that the most advanced body of probabilistic methods is available as physiologically based pharmacokinetic / toxicokinetic (PBPK/PBTK) modeling (McLanahan et al., 2012). PK / TK theoretical foundation, practical application, and various software packages have been developed in pharmacology (Leung, 1991) and later adapted to toxicology (Bogen and Hall, 1989) for the environmental health context by friends and collaborators such as Mel Andersen, Bas Blaauboer, Frederic Bois, Harvey Clewell, George Loizou, Amin Rostami-Hodjegan, Andrew Worth and others; please see their work for more substantial discussions. Several workshops have documented the field (Tab. 5). Most recently, a textbook became available (Fisher et al., 2020). Loizou et al. (2008) stress the need for kinetics in risk assessment: "The need for increasing incorporation of kinetic data in the current risk assessment paradigm is due to an increasing demand from risk assessors and regulators for higher precision of risk estimates, a greater understanding of uncertainty and variability …, more informed means of extrapolating across species, routes, doses and time …, the need for a more meaningful interpretation of biological monitoring data … and reduction in the reliance on animal testing … . Incorporating PBPK modelling into the risk assessment process can advance all of these objectives."

Probability of hazard
What indicates a probability of hazard? These four principal components come to mind:

1.
Traditional test data on the given substance, which can range from physicochemical measurements to animal guideline studies.

2.
Such information on similar substances enabling (automated) read-across.

3.
Structural alerts such as functional groups or chemical descriptors enabling (quantitative) structure-activity relationships ((Q)SAR).

4.
Mechanistic alerts typically from in vitro testing or (clinical) biomarkers.
How these (jointly) indicate a probability of hazard and how to quantify it, is usually not clear. Some elements are more established. We have shown earlier how a combination of (1) and (2) can be used to derive probabilities of hazard (Luechtefeld et al., 2018a,b). These probabilities or, the other way around, measures of uncertainty are among the most remarkable features of the approach (Hartung, 2016) as they indicate whether more information is needed. The approach called read-across-based structure-activity relationship (RASAR) covers the nine most frequently used animal test-based classifications by OECD test guidelines. The method has been implemented as Underwriters Laboratories (UL) Cheminformatics Toolkit 42 ; it has been further developed utilizing deep learning, making (non-validated) estimates of potency as GHS hazard classes and handling applicability domains of chemicals more explicitly. Notably, the method has been included in the new Australian chemicals legislation 43 , the Industrial Chemicals Act 2019 or AICIS (Australian Industrial Chemicals Introductions Scheme) in effect since July 1, 2020. This law creates a new regulatory scheme for the importation and manufacture of industrial chemicals by Australia. Unlike other jurisdictions, "industrial chemicals" includes personal care and cosmetics, and there is a full ban on new animal testing for these ingredients and dualuse ingredients that are used both in cosmetics and industrial uses. However, broader international acceptance of read-across as promoted also by the EUToxRisk project 44 is still outstanding (Chesnut et al., 2018;Rovida et al., 2020). Other A.I.-based methods for hazard identification, which are more or less explicit in expressing probabilities of their predictions, are available (Zhang et al., 2018;Santin et al., 2021).
The approach under (3) is well-known as (Q)SAR, which has been covered earlier in this series of articles (Hartung and Hoffmann, 2009). (Q)SAR are based on structural alerts and physicochemical descriptors. Currently, we are exploring the integration of (Q)SAR as input parameters of the RASAR approach.
Most development is needed for (4). A read-across type of approach has been introduced for the US EPA ToxCast 45 data (Shah et al., 2016), which tested about 2,000 chemicals in hundreds of robotized assays. This was also termed generalized read-across 46 . Pioneering work showed how to use this to predict endocrine activity (Browne et al., 2015;Kleinstreuer et al., 2018a;Judson et al., 2020). However, it is not clear how to extend this to chemicals that were not included in the ToxCast program. We discussed the opportunities of readacross of such biological data earlier (Zhu et al., 2016).
Most toxicologists, out of habit, talk of a xenobiotic exposure "causing" a certain effect, e.g., genotoxins cause cancer, etc. Yet, in reality, this is rarely the case -even when chemical exposures have a clear role in both initiation and progression, there is still a strong stochastic element involved (Tomasetti et al., 2017). For example, bilateral breast cancer is very rare, although both tissues have identical exposures. For other endpoints, it is even more important to remain mindful of the uncertainty intrinsic to most of the causal associations we are looking for in toxicology: For most diseases (Alzheimer's and autism to name a few) we know that the environment plays an important role; however, decades of studies have failed to find any chemical "smoking gun". We are instead likely looking for multiple exposures, over a lifetime, each of which may be individually insignificant, but which can, in vulnerable individuals, act as a tipping point.
One conceptual alternative to asking which chemicals "cause" which diseases is instead thinking of potential chemicals as quantifiable liabilities in a threshold-liability model. The threshold-liability model holds that for a given disease there exists within the population some probability distribution of thresholds, with some individuals with a high threshold (the life-long smoker who fails to develop lung cancer or heart disease) and others with considerably lower thresholds. Disease happens when an individual's liabilities (which can include environmental exposures, stochastic factors, and epigenetic alterations) exceed their threshold. Such a model has been applied to amyotrophic lateral sclerosis (ALS) -a disease that has no known replicable environmental factors and is likely best characterized as the result of a pre-existing genetic load that faces environmental exposures over a lifespan and eventually reaches a tipping point, wherein neurodegeneration begins. While the past decade has seen an enormous expansion in our understanding of the genetic load component thanks to large-scale genome-wide association studies, the environmental component remains poorly characterized. While this is no doubt in part due to the much larger search space for environmental exposures, it must be acknowledged that the tools toxicologists employfor example, looking for chemicals that will cause an ALS-like neurodegenerative phenotype in rodents at very high doses -are likely not ideal (Al-Chalabi and Hardiman, 2013).
An area where ProbRA has shown important (but largely neglected) opportunities is the test battery of genotoxicity assays. Depending on the field of use, three to six in vitro assays are carried out and, typically, any positive result is taken as an alert, leading to a tremendous rate of false-positive classifications as discussed earlier (Basketter et al., 2012). Aldenberg and Jaworska (2010) applied a BN to the dataset assembled by Kirkland et al., showing the potential of a probabilistic network to analyze such datasets. Expanding on work by Jaworska et al. (2013Jaworska et al. ( , 2015 for skin sensitization potency, we earlier showed how probabilistic hazard assessment by dose-response modeling can be done using BN (Luechtefeld et al., 2015). Our contribution was more technical (using feature elimination instead of QSAR, hidden Markov chains, etc.), but it moved the model's potency predictions to standing cross-validation. Most recently, Zhao et al. (2021) compiled a human exposome database of > 20,000 chemicals, prioritized 13,441 chemicals based on probabilistic hazard quotient and 7,770 chemicals based on risk index, and provided a predicted biotransformation metabolite database of > 95,000 metabolites. While the importance of acute oral toxicity for ranking chemicals can be argued, it shows impressively how probabilistic approaches can be applied to large numbers of substances to allow prioritization.

Probability of risk
The prospect of ProbRA is increasingly recognized by regulators as shown earlier for EPA, EFSA and BfR (Tralau et al., 2015) and opinion leaders in the field (Krewski et al., 2014). A framework for performing probabilistic environmental risk assessment (PERA) was proposed (Verdonck et al., 2002(Verdonck et al., , 2003. Risk assessment obviously requires combining hazard and exposure information; van der Voet and Slob (2007) suggested an approach where exposure assessment and hazard characterization are both included in a probabilistic way. Table 6 gives a few examples of ProbRA; notably they are very different in approach and quality, but they illustrate possible applications. Slob et al. (2014) used the ProbRA approach to explore uncertainties in cancer risk assessment. Together, this very incomplete list of examples of ProbRA in toxicology shows the potential of the technology.
As discussed above, a key element of ProbRA is the analysis of how the system is challenged and can fail. This is reminiscent of the AOP approach, which can be seen as the implementation of the call for toxicity pathway mapping from the "Toxicity testing in the 21 st century movement" (Krewski et al., 2020). Based on the respective National Academy of Sciences / NRC report (NRC, 2007), a change toward new approach methodologies (NAMs) away from traditional animal testing, which is based on mechanistic understanding, i.e., toxicity pathways, pathways of toxicity (PoT) (Hartung and McBride, 2011;Kleensang et al., 2014) or, increasingly, AOP (Leist et al., 2017) is suggested.
A major obstacle to the introduction of NAMs in regulatory decision-making has been the lack of confidence, or substantial overall uncertainty, in their fitness-for-purpose. While some individual aspects of NAMs contributing uncertainty are assessed in a systematic and thorough manner, a comprehensive approach that maps all uncertainties involved is lacking. A generic framework that integrates current mechanistic knowledge, e.g., condensed into AOP, biological plausibility of NAMs in relation to that knowledge, and NAM reproducibility with well-established risk assessment-related uncertainties, such as intra-and interspecies differences, has the potential to provide a widely agreed basis for a realistic purpose-focused assessment of NAMs. For a given question, e.g., the determination of a specific health hazard, mapping available evidence for the various uncertainty sources onto the framework will provide a complete overview of strengths, weaknesses, and gaps in our mechanistic understanding and ask is the NAM relevant for the health effect? Such an understanding will not only guide future NAM development, but it also allows to uncouple current regulatory practices, i.e., essentially animal-based approaches, from the aim of assessing health effects in humans.
Animal-based approaches are deeply rooted in regulatory approaches, but also in toxicology and environmental health, so that they are often used as a surrogate aim, not making their strengths and weaknesses explicit and transparent. A clear separation of the two would enable a fair and transparent assessment of NAMs, unbiased by current animal-based practices, for the purpose of protecting human health. Depending on the complexity of the human health effect, this approach will provide a clear path to reducing the overall uncertainty in NAM to achieve sufficient confidence in their results (Fig. 2).
For the identification of sources of uncertainty, uncertainty in our mechanistic understanding of the biological events that lead to human health effects needs to be identified by systematically mapping the peer-reviewed literature that has addressed this topic. Outcomes of recent workshops organized by the EBTC 6 (de Vries et al., 2021;Tsaioun et al., in preparation), relevant information from national and international bodies, especially the guidance and case studies of the OECD, and the opinions of leading scientists should be incorporated. The sources of uncertainties in NAM need to be identified using a similar approach, with a focus on literature and other information on the assessment of individual NAM and combinations of NAM in testing strategies.
In order to build the generic framework, the literature can be screened for initiatives in the field of toxicology and environmental health that could be built upon, e.g., by Bogen and Spear (1987). A top-down approach is recommended that starts with a (close to) ideal situation: That is either the theoretical assumption that hazard or risk for a certain health effect upon exposure to an stressor X is known, i.e., quantifiable without uncertainty, or the more practical assumption of adapting the concept of a "target" trial, i.e., a hypothetical, not necessarily feasible or ethical trial, conducted on the population of interest, whose results would answer the question (see, e.g., Sterne et al., 2016). The aim of addressing a human health effect exclusively with NAM and identifying the uncertainties introduced by each step could be achieved by careful mapping of interdependence of sources of uncertainty and will be essential for their integration. This process needs to consider lessons learned from the deterministic and probabilistic integration of uncertainties of animal studies that can be transferred to NAM.
The resulting frameworks could be explored by applying a select one as a case study. For illustration, skin sensitization hazard identification and risk assessment lends itself to this purpose for the following reasons: • low complexity of the etiology of skin sensitization • availability of a well-described AOP (Fig. 3), including formal confidence assessment 47 (OECD 2014) • availability of NAMs for the AOP events, many as OECD Test Guidelines (OECD 2018a,b, 2020) • well-characterized NAMs, e.g., limitations, reproducibility, etc. (Hoffmann et al., 2018) • availability of testing strategies, so-called defined approaches (DA) (Kleinstreuer et al., 2018b) • next generation skin sensitization risk assessment (NGRA) approach of cosmetic ingredients (Gilmour et al., 2020) Available evidence for the various sources of uncertainty needs to be collected and plugged into the framework. Interdependencies of uncertainties can be explored or modelled, where applicable, to inform a qualitative or semi-quantitative integration of all uncertainties to characterize the confidence in the final decision.
The main results would be a generic framework that maps all sources of uncertainty in NAM-based regulatory decisions on human health. Such an objective evidencebased framework enables a transparent fit-for-purpose assessment of NAM and NAM combinations, e.g., integrated approaches to testing and assessment (IATA) (OECD, 2017). Application of the framework will allow for mapping of NAMs and characterization of uncertainty in an integrative manner, while highlighting the strengths but especially the weaknesses and knowledge and NAM gaps. This in turn will help direct future research to address the identified shortcomings. Ultimately, such a comprehensive and transparent approach is a pre-requisite to increase the regulators' confidence in NAM-based decisionmaking to a level that will allow abandoning the traditional animal-based approaches, not least as it allows comparison of the approaches.
11 Evidence-based medicine / toxicology and the role of probability and uncertainty Rysavy (2013) titled an editorial "Evidence-based medicine: A science of uncertainty and an art of probability". In fact, a lot of the change brought about by evidence-based medicine is replacing the eminence-based (authoritarian) black-and-white of "this is the diagnosis/ this is the treatment" to an acceptance of uncertainties, probabilities for differential diagnoses, treatment options, and associated odds for outcome etc., exactly what we describe for ProbRA and its challenge to classification and labeling of toxicities. By promoting transparency and mapping uncertainties and biases as well as broad evidence use, ProbRA promotes very similar goals to evidence-based toxicology.
12 Thresholds of toxicological concern (TTC) as probabilistic approaches TTC represent a bit of a hybrid between the two worlds. They are based on the distribution of no adverse effect levels (NOAEL), and then the 5 th percentile is used as a threshold, applying a safety factor of typically 100 (Hartung, 2017b). Future refinements of the concept might embrace uncertainty and probability considerations. As shown below, TTC might already now serve a role in the ProbRA approach.

Probabilistic avatars
Virtual representations of patients (avatars, digital twins) 48 , 49 are increasingly developed as an approach to personalized medicine and even virtual clinical trials (Brown, 2016;Bruynseels et al., 2018). The European DISCIPULUS Project 50 , 51 developed a roadmap for research and development. Earlier (Hartung, 2017c), we suggested that this is a logical extrapolation of the AOP concept: "A virtual patient is not far from the creation of a personal avatar for each patient, where the standard model is adapted to the genetic and pharmacokinetic parameters of the patients and where interventions can be modeled and optimized in virtual treatments. Certainly still largely science fiction, but these were any of the technologies of our current toolbox some decades ago too". Here, it is important to note that the key underlying concept is the probabilistic approach of PB-PK. Similar to modeling disease and treatment, the hazardous consequences of exposure might be modelled in the future.
48 https://siliconangle.com/blog/2018/04/20/digital-twins-personalized-medicine-promising-caveats/ 49 https://www.philips.com/a-w/about/news/archive/blogs/innovation-matters/20181211-the-digital-patient-will-we-one-day-have-ourown-health-avatars.html 50 http://www.digital-patient.net 51 https://www.vph-institute.org/upload/discipulus-digital-patient-research- roadmap_5270f44c03856.pdf Noteworthy, this is also an interesting concept in the context of animal testing. Similar avatars of experimental animals might help with species extrapolations. Furthermore, we often point out that tests like the Draize rabbit eye test are not very reproducible. One source of variance is probably the animals themselves. Modeling the result of an animal test as a function of the chemical and animal tested (here avatar of the animal) would probably explain some of the uncertainty.

Artificial intelligence (A.I.) as the big evidence integrator delivers probabilities
A central problem of toxicology is evidence integration. More and more methodologies and results, some conflicting and others difficult to compare, are accumulating. We are facing this problem in more and more risk assessments, just thinking of tens of thousands of publications on bisphenol A, for example. Similarly, systematic reviews (Hoffmann et al., 2017;Farhat et al., 2022;Krewski et al., 2022) need to combine different evidence streams (EFSA and EBTC, 2018; Krewski et al., in preparation). Last but not least, the combination of tests and other assessment methods in integrated testing strategies Tollefsen et al., 2014;Rovida et al., 2015), a.k.a. IATA or DA by OECD, need to integrate different types of information. Again, probabilistic tools lend themselves to all of these.
We have earlier discussed how probabilistic approaches can help with integrated testing strategies, for example by determining the most valuable (next) test . Briefly, we can ask how much the overall probability of the result can change with any outcome. Often, we might conclude that this is not actually worth the additional work, bringing an end to endless testing. Value of information analysis (Keisler et al., 2013) has enormous potential in toxicological decision-taking. This leads us to a type of information economics. Information economics is the discipline of modeling the role of information in an economic system as a fundamental force in every economic decision. We have stressed economic considerations earlier in this series of articles (Meigs et al., 2018). It seems like an interesting extension of this thinking if the investment into testing is contrasted quantitatively with the possible gain.
In the extreme, toxicology is seeing the rise of big data, which is defined by the three Vs: volume, velocity, and variety. These are key to understanding how we can measure big data and just how very different big data is to traditional data. Different technologies fuel this, such as omics technologies, high-content imaging, robotized testing (e.g., by ToxCast and the Tox21 alliance), sensor technologies, curated legacy databases, scientific and grey literature of the internet, etc. (Hartung and Tsatsakis, 2021). A.I. is making big sense from big data (Hartung, 2018b). It is worth mentioning that machine learning approaches frequently struggle with probabilities. Several existing approaches attempt to merge machine learning methods with probabilistic methods by modeling distributions or using Bayesian updating 52 . Frequently the outputs of neural networks are interpreted as probabilities, which can be problematic. Here, more work needs to be done.
Most importantly, by adopting a probabilistic view on safety information, we might come to a more flexible use of new approaches over time. If we do not see an individual method as definitive but only changing probabilities, we might be able to avoid the "war of faith" on the usefulness of animal tests, for example. Over time, we will see how the individual evidence sources contribute to the result of our A.I.-based integration. This might allow phasing out those methods that do not deliver valuable information and implementing those that do.

Conclusions and the way forward
As soon as we accept that risk assessment occurs with uncertainty and give up on the illusion of absolute safety, we must deal with probabilities. This is what science can deliver, as every experiment can only approximate truth. Working with models of reality with limited resources and technologies, and inherent variabilities and differences introduces uncertainty. The advantage of ProbRA is making these visible and estimating their potential contribution. By quantifying these uncertainties, we do not always need to default to the most conservative "precautionary" approach but can define acceptable risks and deprioritize scenarios clearly below them. ProbRA of chemicals offers numerous advantages compared to traditional deterministic approaches as well as several challenges 53 (Tab. 7) (Kirchsteiger, 1999;Verdonck et al., 2002;Scheringer et al., 2002;Parkin and Morgan, 2006;Bogen et al., 2009;EPA, 2014).
The impressive list of advantages strongly encourages embracing the concept of ProbRA, especially as it makes more (transparent) use of evidence, something the authors have been arguing for in the context of evidence-based toxicology. This is reminiscent of "factfulness" as coined by Hans Rosling and coauthors (2018), who remind us in a very different context why we fail to recognize a changing world and grasp new insights. A major challenge is education, as the lack of familiarity among stakeholders and the public with ProbRA is a major challenge: "Many view PRA [ProbRA] as a highly technical discipline that uses sophisticated mathematics and requires extensive training to apply and understand. Single point estimates are easier to grasp for most people, based in part on familiarity with this approach over the history of EPA. Although some people initially have difficulty interpreting probability distributions of values, everyone has a common baseline experience with probability, uncertainty and variability from everyday life (e.g., weather forecasting, odds of winning a lottery), and this experience could be used to frame the discussion of results. It is not necessary to understand the underlying mathematics or even to include results as full distributions. Results can be distilled down to the critical essence or decision-meaningful input of interest." (EPA, 2014). To contrast this optimistic view on communicating our scientific uncertainty, Bertrand Russell stated, "The fundamental cause of the trouble is that in the modern world the stupid are cocksure while the intelligent are full of doubt".
Regulatory agencies play a key role for the implementation of ProbRA: The US EPA concluded in 2014 that "Strategic use of PRA [ProbRA] would allow EPA to send the appropriate signal to the intellectual marketplace, thereby encouraging analysts to gather data and develop methodologies necessary for assessing uncertainties" but also noticed: "A clear institutional understanding of how to incorporate the results of probabilistic analyses into decision making is lacking". ProbRA is a form of data analysis making use of probabilities. There are four major data analytics disciplines 54 :
To some extent, ProbRA touches on all four aspects, but the central argument here is its use to predict risks. Toxicology would be well-served to address the value of probabilistic approaches in all of these.
ProbRA is a key element of the European flagship project ONTOX 55 (Vinken et al., 2021) and the ASPIS cluster 56 formed with two sister projects. ONTOX shall deliver a generic strategy to create innovative NAMs in order to predict systemic repeated dose toxicity effects that, upon combination with tailored exposure assessment, enable human risk assessment. The six specific adversities addressed are in the liver (steatosis and cholestasis), kidneys (tubular necrosis and crystallopathy) and developing brain (neural tube closure and cognitive function defects). A workshop on ProbRA jointly organized by CAAT through the transatlantic think tank for toxicology (t 4 ) 57 and ONTOX will further address this topic this summer. With a broad participation of regulators from both sides of the Atlantic in ASPIS, this promises to stimulate renewed discussion about ProbRA in regulatory sciences.
Here, we would like to put forward a vision for ProbRA. Figure 4 shows the combination of the different probabilistic approaches above. Noteworthy, we see a key role for TTC to abrogate risk assessment where exposure and/or bioavailability (internal TTC) (Hartung and Leist, 2008;Partosch et al., 2015) is negligible. A.I. will play a key role for data extraction as well as for evidence integration. Here, especially Bayesian approaches lend themselves to the deduction of a probability of risk. Probability of hazard as the other starting point will be informed by data available on a given chemical including through (Q)SAR as well as data on similar chemicals through automated read-across. Here, we will build on the RASAR (Hartung, 2016;Luechtefeld et al., 2018a,b). An additional line of information on possible hazard will come from mechanistic alerts. The ontology approach of organizing such knowledge (Desprez et al., 2019) will be followed.
A key question for the future will be whether to employ a frequentist or Bayesian ProbRA? Based on the discussion of the Bayesian approach above, this seems to be most promising but might overwhelm risk assessment practitioners with its additional complexities. In areas like evidence integration by BN and similar, it might already sneak in as part of the data analysis procedures. A big limitation of machine learning models is causal inference. BN can sometimes handle that better. There are relationships between probabilistic inference and causal inference. If your training data has only been built within a certain environment, then machine learning models (and even probabilistic methods) can learn conditional probability relationships that are not valid -basically the same thing as saying correlation is not causation. It is worth mentioning that the problems A.I. has with learning probability distributions also can apply to animal testing, particularly methods like weight of evidence.
Overall, there is great promise of Bayesian tools for risk assessment (Linkov et al., 2015).
16 Is ProbRA the keystone, the capstone, or the cornerstone of a new risk assessment?
While well-defined in masonry 58 , these terms are sometimes used interchangeably in the figurative sense. It is worth thinking what the different terms mean relative to "building" the new toxicology (Fig. 5). The cornerstone, i.e., "the first stone laid when constructing a masonry foundation. It is considered the most important stone in the building, as all other stones are laid in reference to this first, cornerstone", represents the hazards and exposures to protect against. The subsequent stones are the technologies and models, which allow to assess the two. As laid out above, ultimately, this leads to a probability of hazard and a probability of exposure for an individual by integration of the population. The two sides of the arch need to be combined by the keystone, i.e., "the central stone placed at the top of an arch. The keystone is the apex of an arch, without it the arch would not stand. The keystone is placed last when constructing an arch, locking all the other stones into place." This is, in the authors' view, the role of ProbRA, as the title of this article already gives away. Noteworthy, "The word keystone is often used figuratively to mean the central idea of a philosophy, process, business proposition or principle upon which the entire philosophy, process, business proposition or principle stands." What about the capstone then? "A capstone is a finishing stone atop an exterior wall or roof or other exterior architectural feature. The capstone protects the masonry, causing water to flow in a certain way as to mitigate erosion." The best match would be the risk management implemented on the basis of probability of risk and policy decisions, i.e., what is best for society, the "polis". As laid out above, it is tempting to call for this to be an evidence-based risk management.
Let's close this reasoning about building ProbRA with a quote from the English author Walter Bagehot (1826-1877), "Life is a school of probability". We are looking forward to making probability a greater part of the life of toxicologists.

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript

Adverse outcome pathway (AOP)
An AOP is a sequence of events from the exposure of an individual or population to a chemical substance through a final adverse (toxic) effect at the individual level (for human health) or population level (for ecotoxicological endpoints). The key events in an AOP should be definable and make sense from a physiological and biochemical perspective. AOPs incorporate the toxicity pathway and mode of action for an adverse effect. AOPs may be related to other mechanisms and pathways as well as to detoxification routes.

Applicability domain
The physicochemical, structural, or biological space and information that was used to develop a (Q)SAR model and for which that model gives predictions with a given level of reliability.

Bias
A systematic error or deviation in results or inferences from the truth.

Biokinetics (in toxicology)
Science of the movements involved in the distribution of substances.

Biomarker
Indicator signaling an event or condition in a biological system or sample and giving a measure of exposure, effect, or susceptibility.

Data analysis procedure (DAP)
DAP refers to a procedure incorporating both a data interpretation procedure (DIP) and a prediction model (PM).

Deterministic
A methodology relying on point (i.e., exact) values as inputs to estimate risk; this obviates quantitative estimates of uncertainty and variability. Results also are presented as point values. Uncertainty and variability may be discussed qualitatively or semi-quantitatively by multiple deterministic risk estimates.
59 https://www.cebm.ox.ac.uk/resources/ebm-tools/glossary Frequentist (or frequency) probability A view of probability that concerns itself with the frequency with which an event occurs given a long sequence of identical and independent trials.
Hazard 1) A biological, chemical, or physical agent with the potential to cause an adverse health effect.
2) The inherent characteristic of a material, condition, or activity that has the potential to cause adverse effects to people, property, or the environment.

Hazard identification
The risk assessment process of determining whether exposure to a stressor can cause an increase in the incidence or severity of a particular adverse effect, and whether an adverse effect is likely to occur.

Integrated testing strategy (ITS)
In the context of safety assessment, an integrated testing strategy is a methodology which integrates information for toxicological evaluation from more than one source, thus facilitating decision-making. This should be achieved whilst taking into consideration the principles of the Three Rs (reduction, refinement, and replacement).

Likelihood ratio
The likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder.

Model
A mathematical representation of a natural system intended to mimic the behavior of the real system, allowing description of empirical data and predictions about untested states of the system.

Modeling
Development of a mathematical or physical representation of a system or theory that accounts for all or some of its known properties. Models often are used to test the effect of changes of components on the overall performance of the system.

Monte Carlo analysis or simulation
A repeated random sampling from the distribution of values for each of the parameters in a generic exposure or risk equation to derive an estimate of the distribution of exposures or risks in the population.

One-dimensional Monte Carlo analysis
A method for making probability calculations by random sampling from one set of distributions, all representing uncertainty about non-variable quantities or categorical questions. A numerical method of simulating a distribution for an endpoint of concern as a function of probability distributions that characterize variability or uncertainty. Distributions used to characterize variability are distinguished from distributions used to characterize uncertainty.

Parameter
A quantity used to calibrate or specify a model, such as "parameters" of a probability model (e.g., mean and standard deviation for a normal distribution). Parameter values often are selected by fitting a model to a calibration data set.

Physiologically-based pharmacokinetic models (PBPK)
A computer model that describes what happens to a chemical in the body. This model describes how the chemical gets into the body, where it goes in the body, how it is changed by the body, and how it leaves the body.

Probability
Defined depending on philosophical perspective: (1) the frequency with which sampled values arise within a specified range or for a specified category; (2) quantification of judgement regarding the likelihood of a particular range or category. A frequentist approach considers the frequency with which samples are obtained within a specified range or for a specified category (e.g., the probability that an average individual with a particular mean dose will develop an illness).

Probability density function
In probability theory, a probability density function (pdf) of a continuous random variable is a function, often denoted as f(x), that describes the relative likelihood for this random variable to take on a given value.

Probabilistic modeling
A technique that utilizes the entire range of input data to develop a probability distribution of exposure to risk rather than a single point value. The input data can be measured values and/or estimated distributions. Values for these input parameters are sampled thousands of times through a modeling or simulation process in order to develop a distribution of likely exposure or risk. Probabilistic models can be used to evaluate the impact of variability and uncertainty in the various input parameters, such as environmental exposure levels, fate, and transport processes.

Probabilistic risk analysis (ProbRA)
A risk assessment that uses probabilistic methods (e.g., Monte Carlo analysis) to derive a distribution of risk based on multiple sets of values sampled for random variables. Calculation and expression of health risks using multiple risk descriptors to provide the likelihood of various risk levels. Probabilistic risk results approximate a full range of possible outcomes and the likelihood of each, which often is presented as a frequency distribution graph, thus allowing uncertainty or variability to be expressed quantitatively.
The process of changing one variable while leaving the others constant to determine its effect on the output. This procedure fixes each uncertain quantity at its credible lower and upper bounds (holding all others at their nominal values, such as medians) and computes the results of each combination of values. The results help to identify the variables that have the greatest effect on exposure estimates and help focus further information-gathering efforts.

Two-dimensional Monte Carlo analysis
A method for making probability calculations by random sampling from two sets of distributions, one set describing the variability of variable quantities, and the second set representing uncertainty, including uncertainty about the parameters of the distributions describing variability. An advanced numerical modeling technique that uses two stages of random sampling, also called nested loops, to distinguish between variability and uncertainty in exposure and toxicity variables. The first stage, often called the inner loop, involves a complete 1-D MCA simulation of variability in risk. In the second stage, often called the outer loop, parameters of the probability distributions are redefined to reflect uncertainty. These loops are repeated many times resulting in multiple risk distributions, from which confidence intervals are calculated to represent uncertainty in the population distribution of risk.

Uncertainty
Uncertainty occurs because of a lack of knowledge. It is not the same as variability. For example, a risk assessor may be very certain that different people drink different amounts of water but may be uncertain about how much variability there is in water intakes within the population. Uncertainty often can be reduced by collecting more and better data, whereas variability is an inherent property of the population being evaluated. Variability can be better characterized with more data, but it cannot be reduced or eliminated. Efforts to clearly distinguish between variability and uncertainty are important for both risk assessment and risk characterization, although they both may be incorporated into an assessment.

Uncertainty analysis
A detailed examination of the systematic and random errors of a measurement or estimate; an analytical process to provide information regarding uncertainty.

Value of information
An analysis that involves estimating the value that new information can have to a risk manager before the information is actually obtained. It is a measure of the importance of uncertainty in terms of the expected improvement in a risk management decision that might come from better information.

Variability
Refers to true heterogeneity or diversity, as exemplified in natural variation. For example, among a population that drinks water from the same source and with the same contaminant concentration, the risks from consuming the water may vary. This may result from differences in exposure (e.g., different people drinking different amounts of water and having different body weights, exposure frequencies and exposure durations), as well as differences in response (e.g., genetic differences in resistance to a chemical dose). Those inherent differences are referred to as variability. Differences among individuals in a population are referred to as inter-individual variability, and differences for one individual over time are referred to as intra-individual variability.

Fig. 4: A vision for probabilistic risk assessment (ProbRA) of substances
ProbRA is fueled by probability of exposure and probability of hazard and susceptibility. Exposure is first characterized by a population distribution (cumulative from the individuals' exposure distributions). Where they do not exceed applicable thresholds of toxicological concern (TTC), the assessment might be abrogated on the ground of negligible exposure. Probabilistic physiology-based pharmacokinetic (or toxicokinetic, respectively) modeling (PBPK) translates these into resulting tissue concentrations. This can be refined by adsorption, metabolism, distribution & excretion (ADME) measurements or estimates.
Internal TTC again might allow to abrogate the assessment in case of irrelevant tissue level concentrations. The second line of evidence is establishing the probability of hazard. This can be based on mechanistic data, mechanistic tests, and read-across to similar chemicals and any combination thereof. This probability is ideally combined with a distribution of susceptibility of different individuals. Together, tissue level concentrations and hazard probabilities give a probabilistic risk for an individual and cumulatively for the population. Low risk can lead to deprioritization depending on the use scenario, while high risk should lead to classification and risk management measures as appropriate. Intermediate probabilities of risk, i.e., high uncertainties, should be considered for additional testing, ideally considering the economics of possible information gain, or precautionary risk management.   Key questions addressed in ProbRA and associated tools

Question Tools
What can go wrong? Screen important initiators. Master logic diagrams (MLD) or failure modes and effects analyses (FMEA); in toxicology, these would be relevant exposures or molecular initiating events (MIE) triggered within the adverse outcome pathway (AOP) framework What are the adverse consequences? Deterministic analyses that describe the phenomena that could occur along the path of the accident (here hazard) scenario. In toxicology, this can be understood as the exposure-to-hazard path, more recently defined as AOP with their key events (KE).
What is the probability of adverse consequences? Boolean logic methods for model development (e.g., event tree analysis (ETA) or event sequence diagrams (ESD) analysis and deductive methods like fault tree analysis (FTA)) and by probabilistic or statistical methods for the quantification portion of the model analysis (deductive logic tools like fault trees or inductive logic tools like reliability block diagrams (RBD) and FMEA). The final result of a ProbRA is given in the form of a risk curve and the associated uncertainties. This is evidently least translated to toxicology. ALTEX (Bessems et al., 2014) The aim of the workshop was to critically appraise PBK modelling software platforms as well as a more detailed state-of-the-art overview of non-animal based PBK parameterization tools. Such as: 1) Identification of gaps in non-animal test methodology for the assessment of ADME. 2) Addressing user-friendly PBK software tools and free-to-use web applications. 3) Understanding the requirements for wider and increased take up and use of PBK modelling by regulators, risk assessors and toxicologists in general. 4) Tackling the aspect of obtaining in vivo human toxicokinetic reference data via micro-dosing following the increased interest by the research community, regulators, and politicians US FDA: Application of Physiologicallybased pharmacokinetic (PBPK) modelling to support dose selection, 2014, Silver Spring, MD, USA (Wagner et al., 2015) Workshop to (i) assess the current state of knowledge in the application of PBK in regulatory decision-making, and (ii) share and discuss best practices in the use of PBK modelling to inform dose selection in specific patient populations EURL ECVAM: Physiologically-based kinetic modelling in risk assessment -Reaching a whole new level in regulatory decisionmaking, 2016, Joint Research Centre, Italy  Strategies to enable prediction of systemic toxicity by applying new approach methodologies (NAM) using PBK modelling to integrate in vitro and in silico methods for ADME in humans for predicting whole-body TK behavior, for environmental chemicals, drugs, nano-materials, and mixtures. (i) identify current challenges in the application of PBK modelling to support regulatory decision-making; (ii) discuss challenges in constructing models with no in vivo kinetic data and opportunities for estimating parameter values using in vitro and in silico methods; (iii) present the challenges in assessing model credibility relying on non-animal data and address strengths, uncertainties and limitations in such an approach; (iv) establish a good kinetic modelling practice workflow to serve as the foundation for guidance on the generation and use of in vitro and in silico data to construct PBK models designed to support regulatory decision making. Recommendations on parameterization and evaluation of PBK models: (i) develop a decision tree for model construction; (ii) set up a task force for independent model peer review; (iii) establish a scoring system for model evaluation; (iv) attract additional funding to develop accessible modelling software; (v) improve and facilitate communication between scientists (model developers, data provider) and risk assessors/regulators; and (vi) organize specific training for end users. Critical need for developing a guidance document on building, characterizing, reporting, and documenting PBK models using non-animal data; incorporating PBK models in integrated strategy approaches and integrating them with in vitro toxicity testing and adverse outcome pathways. ALTEX Advantages and challenges for ProbRA in human health risk assessment

Challenges of ProbRA
Improves transparency and credibility by explicit consideration and treatment of all types of uncertainties; clearly structured; integrative and quantitative; allows ranking of issues and results; more information can be obtained by separating variability from uncertainty Problem of model incompleteness; relatively time-consuming in performing and interpreting -this "might be a fertile ground for endless debate between utility and regulator" (Kafka, 1998); regulatory delays due to the necessity of analyzing numerous scenarios using various models Cost effective by assuring that resources are focused on essential safety issues, focuses data collection More complex and time-consuming analysis and decision-making process because more information and insights must be collected, processed and considered for decisions; requires more data than conventional approaches because distributions of values rather than single values are used More realistic compared to the current deterministic RA: avoids worst-case assumptions, realistic exposure assessments; overall picture of risks in the population and not just of extreme cases; a probabilistic reference dose could help reduce the potentially inaccurate implication of zero risk below the reference dose.
The incompleteness of the model is much more "apparent" Improves decision support enabling risk managers to evaluate the full range of variability and uncertainty instead of just using point estimates of exposure, effects, and eventually risk. More complex structure, the assumptions, methods and results are more difficult to understand and require some mathematical education; lack of understanding of the value of ProbRA for decisionmaking; personnel must be very well-informed scientifically and technologically to produce consistent application of standards; requires a different skill set than used in current evaluations, but limited resources (staff, time, training or methods) are available Includes a systematic sensitivity analysis of the uncertainties in the input parameters, which identifies the main sources of uncertainty. Sensitivity analysis is the study of how uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model input (Saltelli et al., 2008). Where extremely rare events must be considered, there are problems with the statistical significance of probabilistic data Application of an optimization process (Apostolakis, 1990) Validation challenge; what to compare against? Good practices lacking More effective risk management; enhances safety and helps manage operability; estimating the success of intervention measures is improved Complicates decision-making where a more comprehensive characterization of the uncertainties leads to a decrease in clarity regarding how to estimate risk for the scenario under consideration Works with limited data: Even if the amount of available adequate probabilistic data is relatively small, the absolute accuracy of the data is not an issue if probabilistic approaches are used as comparative tools, allowing one to make decisions between different design or operation alternatives Minimum data requirements currently are a topic of debate; any quantitative risk estimate only makes sense when the employed data are statistically significant in a sense (i.e., sufficient observations available) and if they originate from similar events and have been analyzed with respect to a common criterion Information economy: Enables estimating formally the value of gathering more information; better prioritize information needs by investing in areas that yield the greatest information value Difficulty to quantify and weigh risks and benefits Various communities have unique sets of perspectives, historical practices, terminologies, resources, and propensities, governed by overlapping set(s) of problems and decision-making goals, regulatory requirements, and legislative mandates being addressed, directly or indirectly, by these interrelated communities.