Chemical Hazard Prediction and Hypothesis Testing Using Quantitative Adverse Outcome Pathways

Current efforts in chemical safety are focused on utilizing human in vitro or alternatives to animal data in a biological pathway context. However, it remains unclear how biological pathways, and toxicology data developed in that context, can be used to quantitatively facilitate decision-making. The objective of this work is to determine if hypothesis testing using adverse outcome pathways (AOPs) can provide quantitative chemical hazard predictions. Current methods for predicting hazards of chemicals in a biological pathway context were extensively reviewed, specific case studies examined, and computational modeling used to demonstrate quantitative hazard prediction based on an AOP. Since AOPs are chemically agnostic, we propose that AOPs function as hypotheses for how specific chemicals may cause adverse effects via specific pathways. Three broad approaches were identified for testing the hypothesis with AOPs, semi-quantitative weight of evidence, probabilistic, and mechanistic modeling. We then demonstrate how these approaches could be used to test hypotheses using high throughput in vitro data and data from alternatives to animal testing. Finally, we discuss standards in development and documentation that would facilitate use in a regulatory context. We conclude that quantitative AOPs provide a flexible hypothesis framework for predicting chemical hazards, which accommodates a wide range of approaches that are useful at many stages and build upon one another to become increasingly quantitative. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is appropriately cited.


Introduction
Risk assessors attempt to identify causal relationships between stressors, for example a chemical, and an outcome of safety and regulatory interest, such as cancer or reproductive impairment, through the integration of a wide range of data and information (NRC, 1983;NRC, 2009;Abt et al., 2010).These efforts are facilitated using hypothesis driven approaches and conceptual frameworks to organize or describe data and information related to risk, e.g.sources and pathways of exposure, ADME (adsorption, distribution, metabolism, and elimination), and the health hazard to individuals, communities, or populations (Suter and Cormier, 2011).Several conceptual frameworks have been proposed to support hazard and risk decision making including mode of action analysis (Sonich-Mullin et al., 2001;USEPA, 2005;Meek et al., 2014), the human relevance framework (Meek et al., 2003), the Key Events/Dose-Response framework (Julien et al., 2009), mechanistic effect modeling (Forbes and Calow, 2012) and the Adverse Outcome Pathway (AOP) framework (Ankley et al., 2010).Simultaneously, hazard and risk assessments are evolving to focus on assays relevant to target species (e.g.human rather than rat), in vitro test data, 'omics data, and biological pathway perturbations leading to apical level changes (NRC and National Research Council, 2007;Krewski et al., 2014).As a result, many opportunities exist to better support decisionmaking through development of new approaches for data integration, incorporation of data from emerging technologies, and extrapolating impacts to safety endpoints based on in vitro assays and chemical structures.
The AOP framework has been the focus of several research and regulatory organizations [e.g. the U.S. Environmental Protection Agency (EPA), the Organization for Economic Co-operation and Development (OECD), Health Canada, and the European Commission Joint Research Center] as an approach to document and categorize chemical hazards in a biological pathway context.To support this effort, a knowledge base, https,//aopkb.oecd.org/,has been created to support the use of AOPs along with guidance for developing AOPs and potential application of AOPs in chemical hazard testing and screening (Meek et al., 2014;Tollefsen et al., 2014;Villeneuve et al., 2014a;b;Becker et al., 2015;Patlewicz et al., 2015;Rovida et al., 2015;Perkins et al., 2015;Groh et al., 2015a;b).However, except for a general discussion on recent description of development of a quantitative AOP (qAOP) for aromatase inhibition leading to reproductive dysfunction in fish (Conolly et al. 2017), little guidance exists on how the AOP framework can be used in a quantitative manner to predict impacts on individuals and populations from in vitro data or even information derived from clinical studies (OECD 2018).
The development of AOP-based quantitative models is likely to face similar challenges to those found with biologically-based dose-response models where extensive data and long development times have limited their use in risk assessment (Crump et al., 2010).Indeed, development of a qAOP for aromatase inhibition took several years (Conolly et al. 2017) but this clearly doesn't have to be the case.Complicating the prediction of adverse outcomes from AOPs is that the number of different AOPs is likely to be in the hundreds, many of which may interact as networks of pathways either due to the presence of chemical mixtures or activation of multiple events by a single chemical (Garcia-Reyero, 2015).As a result, there is a critical need to examine how quantitative AOPs can be used to meet these challenges, and how their development might be sped up.
Our objective in this paper is to examine how hypothesis testing using quantitative qAOPs can support hazard and risk decisions.We also argue that qAOPs can take many forms from text-based, descriptive AOPs to probabilistic network models (e.g., Bayesian networks) to mathematical models with high biological fidelity thereby enabling rapid development of quantitative approaches to examine chemical effects across multiple AOPs.

Methods
The state of predicting chemical hazards Current methods for predicting hazards of chemicals in a biological pathway context were extensively reviewed based on the published literature.Specific case studies were identified and examined for quantitative measures of biological pathwaybased hazards.

Mechanistic modeling of a chemical impact on a population via an AOP
To demonstrate the mechanistic modeling of an AOP, we constructed a coupled component qAOP model representing the AOP for aromatase inhibition leading to reproductive dysfunction using mathematical and probabilistic models representing different KE within the AOP (Fig. 2, for complete details see supplementary file1 ).Briefly, we extended a hypothalamuspituitary-ovary model of 17β-estradiol synthesis for fathead minnow (Pimephales promelas) by Shoemaker et al., (2010) with a liver component that mathematically described the synthesis rate of VTG mRNA and VTG protein in liver controlled by plasma E2 concentrations with a time delay response (Equations S2 and S3 1 ).The resulting HPG-Liver model was then coupled to the fathead minnow population matrix model of Miller et al. (2007) using a simple linear regression model that describes the relationship between fecundity and plasma VTG concentrations.

Quantitative AOP frameworks to support decision making
Regulators involved in reviewing chemical use and permissible exposure levels are required to come to a sound and objective scientific judgment as to the potential of that chemical to cause adverse effects on humans and the environment (NRC, 2009).Many regulatory applications use a biological pathway-based hypothesis to examine the causal evidence for human and ecological hazards as part of a risk based approach (Boobis et al., 2009;Suter and Cormier, 2011;Meek et al., 2014).Since an AOP represents a plausible biological pathway causally leading to an adverse effect, it can be used as a hypothesis that a chemical causes an adverse effect (e.g.cancer, mortality) when used with chemical specific evidence.An AOP is composed of a molecular initiating event (MIE), where a stressor interacts with a receptor, enzyme or other biomolecule, that in turn causes the activation of a series of measurable events (key events or KE) leading to an adverse outcome (AO) of regulatory interest (Villeneuve et al., 2014a).Importantly for quantitative use of AOPs, MIE, KE and AO are linked by key event relationships (KER) that represent response-response relationships between events that can be used to develop quantitative Adverse Outcome Pathways.
Arguably the greatest potential uses of the AOP framework lies in using it to semi-quantitatively or quantitatively assess, in a transparent manner, the likelihood of a chemical causing an adverse effect, thereby supporting hazard and risk decisions.We anticipate that both existing and future quantitative toxicological models can be incorporated into an AOP framework since AOPs describe biological pathways required for toxicological effects.Three general approaches can be used for hypothesis testing of AOPs in a quantitative manner, ranging from expert judgement-based, requiring limited information, to biological models that require extensive data and development times (Gust et al., 2015), 1) In a semiquantitative or quantitative Weight of Evidence (WOE) manner, where the evidence for a chemical acting through a specific AOP is given a weighted value based on expert opinion and well-documented criteria (e.g.Becker et al., 2017); 2) In a probabilistic manner, where statistical or sufficiency relationships exist between MIE or KE and the AO permit extrapolation from in vitro assays or other data to events relevant to safety assessment and of regulatory interest (e.g.;Miller et al., 2007, Burgoon et al., 2017); and 3) In a mechanistic manner, where mathematical models of MIE, KE and KER can be used to quantitatively predict the risk of an adverse effect given specified initial conditions (e.g.Conolly et al., 2017).These three areas are complementary and build upon one another to become increasingly more quantitative.

Semi-quantitative or quantitative weight of evidence qAOPs
A large source of uncertainty can be due to different ways that the same data is valued by different people, which can lead to different assessments of risk (Weed, 2005).This uncertainty between values and conclusions can be bridged through explicit and transparent description of values and approaches used in assessing data such as clear criteria for how one values toxicity test data.A WOE approach can provide clear criteria and valuations that can be used in reviewing available data supporting hypotheses that a chemical causes an adverse effect through a specific pathway (Weed, 2005).Semi-quantitative and quantitative WOE evaluations incorporate quantitative weighting and numerical assessments of value of data in order to integrate multiple separate lines of evidence into a single value to support decision making (Linkov et al., 2011;Rhomberg, 2014).The use of transparent WOE approaches can increase certainty in scientific judgement by documenting how data is interpreted and integrated to arrive at a final assessment and overcome reluctance to place values on risk assessments (Linkov and Seager, 2011;Rhomberg, 2014;Linkov, 2015).
A simple and direct semi-quantitative approach for assessing the hypothesis that a chemical causes a health hazard through an AOP uses ranking of confidence in the key event relationships (KER) in the AOP using available evidence.The strength of relationships between events is of particular importance as KER define how a perturbation proceeds from one event to the next (OECD, 2013;Villeneuve et al., 2014a;b).As a result, clearly defined criteria, including biological plausibility, essentiality of each KE, response-response concordance, temporal concordance, incidence concordance (the incidence of upstream KE observations is greater than the downstream KE) and causal evidence can be used to assess the confidence in, or strength of, MIE, KE, KER, indirect KER (the ability to indirectly infer a KE or AO from a non-adjacent event) and ultimately the AOP (OECD, 2013;Becker et al., 2015;Patlewicz et al., 2015).Frameworks for assessing dose response relationships (Simon et al., 2014) or approaches determining points of departure from controls (Thomas et al., 2007;Chepelev et al., 2014;Webster et al., 2015;Burgoon et al., 2017;Farmahin et al., 2016) may also be appropriate for placing a value of strength on the KERs.Once KEs and KERs are scored, either based on strict guidance or expert judgment, WOE can be used to assess the overall confidence in the ability of a chemical to activate an AOP or its components (Becker et al., 2015;Linkov, 2015).This WOE approach has long been used to evaluate the impact of different factors in environmental risk assessment ( (Linkov et al., 2006;Linkov and Seager, 2011).Becker et al., (2017) used this hypothesis driven quantitative WOE approach to assess whether clofibrate induces hepatocarcinomas in rodents by one of two hypothesized AOPs, PPAR alpha activation leading to liver tumors in rodents [(MIE PPAR alpha activation) → (KE of altered cell growth pathways) →(KE of perturbation of cell growth and survival) → (KE of clonal expansion of preneoplastic foci) → (AO of rodent Liver tumors)] or Mutagenesis leading to liver tumors in rodents [(MIE DNA reactivity leading to promutagenic adducts/lesions) → (KE of insufficient repair of DNA leading to mutations in key genes) → (KE of perturbation of cell growth and survival) → (KE of clonal expansion of preneoplastic foci) → (AO of rodent Liver tumors)].Becker et al., (2017) demonstrated that one could develop a transparent and quantitative assessment of the evidence supporting whether a chemical acted via a specific pathway.An additional value of WOE approaches is the identification of areas where sufficient information exists to develop statistical relationships or where more resources should be invested to better define the pathway.

Probabilistic quantitative AOPs
Key event relationships and indirect KER describe response-response relationships between events and the outcome and are an essential part of establishing causality in AOPs.Response-response relationships permit development of statistical or probabilistic relationships that enable prediction of the likelihood that a later event or outcome would occur based on changes in an earlier event.The incorporation of statistical or probabilistic relationships into an AOP creates a probabilistic quantitative AOP.This can enhance decision making by requiring less data to support the hypothesis testing for specific chemicals.Probabilistic qAOPs can be composed of predictive relationships that span a few events or an entire AOP and be combined with WOE analyses depending on the application.Probabilistic AOPs can be developed even when all essential KE have not been identified if a predictive relationship exists linking events of interest to the AO or an event is causally linked to the AO.Significant statistical linkages between MIE and adverse outcomes have long been used in screening chemicals for the potential to cause hazardous effects.For example, the AOP for Membrane disruption (Narcosis) leading to respiratory failure is a non-specific toxicity characterized by generalized depression in biological activity that can lead to hypoxia and death for which probabilistic qAOPs exist.Approximately 60% of industrial chemicals are thought to have the potential to exhibit only this mode of action (Van Wezel and Opperhuizen, 1995;Volz et al., 2011) given relevant exposures.Non-polar and polar narcotics diffuse into membranes based on lipophilicity resulting in a significant correlation between a measure of the MIE, hydrophobicity parameter octanol-water partition coefficient or logKow, and the adverse outcome of respiratory failure (Mackay et al., 2009).Highly predictive Quantitative Structure Activity Relationship (QSAR) models have been developed using the predictive relationship of logKow to non-polar narcotic acute toxicity that are used in hazard screening efforts to test the hypothesis that a chemical acts through the Narcosis AOP (Verhaar et al., 1992;Dom et al., 2012;Claeys et al., 2013).QSAR modeling has been widely used to model the effects of chemicals on MIE and has provided substantial support for testing whether not a specific chemical might initiate an AOP (Allen et al., 2016;Cronin et al., 2017) The potential of a chemical to cause effects through endocrine signaling is an important hypothesis tested by several different quantitative approaches.Agonism or antagonism of estrogen receptor signaling is involved in several AOPs including reproductive dysfunction in mammals, fish and other species (Ankley et al., 2010;Becker et al., 2015).As there are strong causal linkages between estrogen receptor activation, many efforts have focused upon predicting chemical binding to the estrogen receptor using QSAR (e.g., Tong et al., 1997), machine learning methods (e.g., Zang et al., 2013), molecular docking (e.g., Shao et al., 2004) and other methods.Models of estrogen receptor binding and subsequent activation of estrogen receptor gene expression have also been used to develop a decision model to facilitate hazard identification and prioritization using a combination of structure activity relationships, receptor binding and Vitellogenin gene activation assays (Schmieder et al., 2014).
High throughput assay data for estrogen and androgen receptor binding and activation have been used in statistical models and have been shown to have significant accuracy in predicting in vivo endpoints (Rotroff et al., 2013;Cox et al., 2014).Since estrogen receptor (ER) binding and activation is the primary MIE leading to estrogenic adverse outcomes in animals, these models have been proposed as prioritization tools.Judson et al. (2015) tested the hypothesis that chemical activates the estrogen receptor by extending these efforts to create a partial qAOP describing the MIE of ER binding and activation, the KE of RNA transcription and translation of ER dependent genes, and the KE of ER-dependent cell proliferation.While this does not specifically model biological events, it does incorporate biological assay results that capture these KE by using a mathematical network model that integrates the areas under curve for assay responses of 18 different high throughput assays to predict potential endocrine agonists and antagonists.

Probabilistic quantitative AOP networks
Probabilistic approaches such as Bayesian network analysis are well suited to the AOP framework because, like a Bayesian network, an AOP is an intuitive representation of a graphical model that is a formal representation of a joint probability distribution (Friedman, 2009;Pearl, 2010).Bayesian network approaches have been used to model outcomes in a wide range of complex systems (for review see (Weber et al., 2012).Bayesian networks are useful for making probabilistic predictions as to whether one or multiple hypotheses are likely to be true, provide diagnostic analysis of evidence available for decisions, and the ability to update calculations based on new evidence such as additional in vitro tests.For example, based on empirical evidence showing that various events in the AOP are predictive of the potential of a chemical to be a skin sensitizer, several non-animal test methods that measure the impact of chemical sensitizers on these key events have been developed (Liebsch et al., 2011;Maxwell et al., 2014).However, individually, the assays are inconsistent in predicting the relative potency of a chemical.Consequently, integrated testing strategies for the AOP for skin sensitization caused by covalent binding to proteins have been developed where Bayesian networks models incorporate in vitro assays representing MIE and KE events to predict the potency of a chemical in inducing a response in a local lymph node assay (Pirone et al., 2014).The value of the skin sensitization Bayes network is that it provides a probabilistic estimation of the sensitization potential of a chemical and allows a quantitative examination of whether addition of more tests would improve the predictive ability of the framework.A further example is the toxicokinetic/toxicodynamic modeling approach taken by MacKay et al., (2013) who have extended an existing skin bioavailability model (Davies et al., 2011) to estimate the probability of allergy in a given human population to be predicted using a toxicodynamic model of skin protein haptenation, DC antigen presentation and CD8 + T cell activation for application in skin sensitization risk assessment.
A major limitation of linear AOPs is that real world risks are due to multiple factors such as diseases caused by interactions between susceptible genotypes and the presence of certain metals, cross-talk between pathways, chemicals that initiate multiple AOPs, or complex mixtures where an organism may be exposed to multiple chemicals.The flexibility of Bayes networks is also valuable in examining multiple interacting variables and networks of interacting AOPs.For example, non-alcoholic liver steatosis can be caused via several AOPs creating a complex network where multiple interactions could contribute to steatosis (Fig. 1; Hashimoto et al. 2000;Reddy, 2001;Grefhorst et al., 2002;Pineda Torra et al., 2003;Kay et al., 2011;Sharif et al., 2014).Understanding the potential contribution of multiple AOPs in the presence of chemical mixtures can be very challenging.Given the appropriate experimental data and formal relationship criteria (OECD, 2013;Becker et al., 2015), statistical relationships could be developed to enable construction of a probabilistic network that captures potential cross talk between pathways.For example, the AOP network for steatosis suggests that a mixture of PPARγ agonists and FXR antagonists could lead to a greater risk of steatosis by increasing lipogenesis through removing feedback repression of LXRα expression (Goodwin et al., 2000;Lu et al., 2000) and increasing PPARγ dependent expression of enzymes involved in lipogenesis (Morán-Salvador et al., 2011).If placed in a Bayesian network context, multiple hypotheses can be tested.

Mechanistic quantitative AOPs
While probabilistic qAOPs are clearly useful in estimating if an adverse outcome may occur given available data, they generally do not explicitly incorporate mechanisms of action and fail to account for regulatory and feedback control mechanisms that underlay compensatory responses.Feedback regulation is a central component of many biological systems (Bhalla and Lyengar, 1999;Avraham and Yarden, 2011;Cowan et al., 2014) including the Hypothalamus-Pituitary-Gonadal (HPG) axis (Norris and Carr, 2013) and the Hypothalamus-Pituitary-Thyroid axis (Chiamolera and Wondisford, 2009;Carr and Patiño, 2011).Mechanistic qAOPs are typically more complex and time-consuming to construct than WOE or probabilistic qAOPs, include (first order) mathematical relationships, and generally represent a more accurate biological model (e.g.Conolly et al. 2017).Mechanistic modeling also facilitates a quantitative estimate of uncertainty in the risk assessment.Mechanistic qAOPs can also include toxicokinetic and toxicodynamic modeling that explicitly capture the details on absorption, distribution, metabolism, and elimination of chemicals.This level of modeling requires large amounts of data and remains a long-term effort for most current AOPs.

Hypothesis testing with mechanistic quantitative AOPs
The AOP for Inhibition of aromatase leading to reproductive dysfunction in fish (Becker et al., 2015) provides a well characterized pathway with which to highlight essential features of a mechanistic qAOP (Conolly et al., 2017).Here, we constructed a mechanistic qAOP for aromatase inhibition using mathematical and probabilistic models representing different KE within the aromatase inhibition AOP (Fig. 2, details of model development and predictions in supplementary file 1 ).The biological effects of aromatase inhibition in fathead minnow have been extensively studied and sufficient data exists to support mechanistic model development (Ankley et al., 2002;Villeneuve et al., 2009;2013).Fadrozole (FAD) is a model endocrine disruptor that specifically inhibits aromatase, an enzyme that catalyzes the final step in synthesis of estradiol (E2), an estrogen required for reproductive function (Browne et al., 1991).Breen et al., (2007) developed a metabolic model of ex vivo fathead minnow ovary slice assays converting cholesterol to testosterone and estradiol that described the MIE of  inhibition of aromatase by the inhibitor fadrozole and KE1 -a decrease in estradiol synthesis.Shoemaker et al., (2010) extended this model to incorporate endocrine signaling feedback control from the ovary to the hypothalamus/pituitary, the luteinizing Hormone/ Luteinizing Hormone Receptor signaling cascade, regulation of the Steriodogenic Acute Response protein responsible for cholesterol transport into mitochondria, and critical transcription factors into a dynamic mathematical model to predict the effects of fadrozole exposure on plasma E2 concentrations (Fig 2).The Shoemaker model was able to accurately predict effects of aromatase inhibition on testosterone and E2 production by fathead minnow in the presence of 50 µg/l FAD over 6, 12, and 24 hrs (Shoemaker et al., 2010).The model is also generally predictive of plasma E2 concentration behavior after 8 d exposures to FAD, although it fails to predict compensation for FAD inhibition of aromatase at low (3 µg/l) concentrations and slow recovery of normal E2 levels after removal of FAD (Fig 3a).This is consistent with the findings of Breen et al. (2013) that additional regulatory or biological elements not explicitly described in the AOP or the model exist that influence fathead minnow responses to FAD.
We extended the Shoemaker HPG model to create a HPG-Liver model that includes KE2, a decrease in estrogen receptor agonism, and KE3, reduced VTG production in liver, by mathematically describing the synthesis rate of VTG mRNA controlled by plasma E2 concentrations with a time delay response (Equation S2 1 ).The process of translation of VTG protein was also described with a time delay function in a separate equation (Equation S31 ).The resulting HPG-Liver model accurately predicted plasma VTG levels in the presence of low (3 µg/l) and high (30 µg/l) concentrations of FAD (Fig. 3b).As with predictions of plasma E2 levels, predicted plasma VTG levels rapidly returned to normal, whereas observed plasma VTG levels took significantly longer to return to normal indicating that additional biological mechanisms need to be incorporated into the models to accurately simulate the impact of FAD on plasma VTG levels.
Since a decrease in plasma VTG levels in female fathead minnows is highly correlated to fecundity KE5, impaired ovulation and spawning (Fig. 2) can be modelled using a simple linear regression model developed by Miller et al., (2007) based on 21-day reproductive studies with different chemical stressors that describes the relationship between fecundity and plasma VTG concentrations relative to untreated females (Equation S4 1 ).The population matrix model of Miller et al., (2007) can then be used to create a complete model of the AOP from aromatase inhibition through the population level outcome by linking the fecundity model, KE5, to population level effects, the AO (See supplementary file 1 ).We used the complete model (Fig. 2) to predict population trajectories for various levels of constant FAD exposure (Fig. 3).As observed by Miller et al., (2007), depression of VTG levels results in decline of population trajectories, relative to no FAD exposure, which stabilize at lower population levels.Concentrations of FAD above 1μg/l produced catastrophic effects resulting in the total collapse of the population (Fig. 3).
The qAOP model described here did not model compensatory behavior found when fathead minnow were exposed to low doses of fadrozole (Villeneuve et al., 2009;2013).Therefore, while the mechanistic qAOP incorporated many of the biological elements essential to describing the aromatase inhibition AOP, additional regulatory mechanisms, such as those proposed by Breen et al., (2013), are still required to capture dynamic behaviors at low doses of chemical inhibitors.

Data standards for development of qAOPs
An essential component of all three approaches to qAOP development is the need for extensive review and documentation of available data and literature.While expert judgment will remain central to AOP development, including qAOPs, transparency in how relationships and values were derived in addition to the reproducibility of WOE, probabilistic, and mechanistic values will be critical to gain acceptance for a AOP/qAOP.Systematic review has emerged as one approach to help ensure transparency and begin to provide guidelines for reproducibility.We view the systematic review standards proposed by the Institute of Medicine (IOM, 2011) to be transferable and helpful in development of AOPs, especially in development and documentation of WOE, probabilistic, and mechanistic qAOPs.
Currently, AOPs are documented and reviewed via the AOP-Wiki2 under the auspice of the Organization for Economic Cooperation and Development (OECD) that also implements an internal and external expert review of AOPs3 .Many of the elements of the systematic review standards can be easily incorporated into the AOP-Wiki to provide transparency in development of AOPs for risk assessment.Here we briefly outline some of our vision of what standards for AOPs might look like, along with some reasoning for their importance.

Standard 1
Initiating an AOP Systematic Review Project.The key element of this standard is to establish the systematic review protocol.This includes describing exactly what the adverse outcome is that will be assessed, in what species the literature search will be conducted, describing the literature search and data extraction strategies, identifying any conflicts of interest that team members have, and describing the study quality protocols.The purpose of this standard is to ensure a rigorous protocol is in place, and that the investigators have given this some thought.In an ideal world, the project team would put their protocol out for comment for a short period of time and incorporate the feedback they obtained prior to starting the project.

Standard 2
Execute the Protocol.The AOP Team executes their searches, performs study quality analyses (this might include a risk of bias analysis), and extracts the data and places into a centralized database.Any deviations from the protocol must be noted.To ensure transparency, the team will need to document their reasons for including and excluding studies.The goal is for teams to provide sufficient documentation of their reasoning that another scientist trained in the field could understand, not necessarily agree, how the team reached their conclusions.This is where the use of standardized questionnaires or score sheets becomes valuable and tools such as the National Institutes of Health's Office of Health Assessment and Translation (OHAT) Risk of Bias Tool4 or the Systematic Omics Analysis Review (SOAR) tool (McConnell et al., 2014) are helpful.
Standard 3 AOP Synthesis.Here, the information from individual studies is synthesized into an overall AOP on the AOP-Wiki.Evidence for events and relationships should be assessed via weight of evidence frameworks as described in Becker et al., (2015).We strongly advocate for the use of meta-analysis and statistical methods to integrate evidence from multiple studies.Publication bias across the databases can be identified using a trim-and-fill plot (Duval and Tweedie, 2000).Analyses using Galbraith plots can help identify if there are differences in standard errors from different studies as a function of effect size (Galbraith, 1990).In an ideal situation where multiple studies all report statistical analyses for the same potential key event, meta-analyses could be used to ascertain the strength of the evidence that a potential key event is or is not involved.For instance, where studies have used gene knock-out models, several individual studies might report that the knock-out of a protein representing a key event is necessary to block the adverse outcome, while a few studies may contradict this view.If all the studies had reasonable approaches, then the data could be integrated using Bayesian analysis thereby permitting an objective determination of whether the data supports the protein being used as a key event or not.The analysis and the results would then become part of the argument for/against inclusion of the key event.
Standard 4 Submit the AOP.For AOPs/qAOPs to be useful, they need to be made publicly available so that others can use them.The AOP-Wiki is an excellent resource for making AOPs available and, potentially, qAOPs in the future.If an AOP/qAOP is made part of the OECD work plan, it will be peer-reviewed by experts.Although we do not see requiring AOPs/qAOPs to be made publicly available as part of the standard, we do see the need for data standards to share them.On the computational side, the AOP Ontology5 allows AOPs (and qAOPs soon) to be exchanged between computers, and to facilitate computational analyses.We would also like to have methods that can translate future exchange formats with other existing standards for systems biology, such as Systems Biology Mark Up Language.
Although these systematic review standards are currently a work in progress, we feel they will help improve the quality of future AOPs/qAOPs, and will facilitate meaningful dialogue, discussion, and the identification of best practices as we continue to develop these tools into the future.

Conclusion
These examples of qAOPs demonstrate several important points, 1) Quantitative AOPs can be developed at many different stages (weight of evidence to mechanistic) dependent upon the amount of information available and intended use.2) qAOPs can be based on a combination of multiple modeling approaches including deterministic mathematical models and statistical models.3) Regulatory mechanisms are not explicitly included as a separate element in an AOP, yet are essential for developing accurate mechanistic qAOPs.If AOPs are to be useful in a mechanistic context then the relationships between KE must be understood and examined in the context of regulatory interactions important in moving from one KE to another.
4) The application of qAOPs in a full risk assessment requires consideration of chemical exposure since the amount of chemical present to initiate an AOP is dependent upon exposure conditions, chemical bioavailability, adsorption, distribution, metabolism, or excretion parameters.Therefore, an AOP coupled to exposure events, provides an intuitive framework for assessing the probability that exposure to a chemical will cause interaction with a MIE, thereby affecting molecular processes at the cellular level, leading to adverse outcomes such as cancer, decreased survival and reproduction, or declines in population size and growth.Future use in risk assessment and decision-making will require a level of confidence in model predictions at least as high as the currently accepted methods.Further efforts will be required to benchmark the output from these mathematical models against relevant datasets, such as human clinical datasets (e.g.diagnostic patch test data) or ecological species where appropriate (Maxwell et al., 2014), although the availability of relevant such data may be limited.We anticipate that in the near term, as high confidence AOPs begin to emerge, we will see the rise of computational predictions based on AOPs for limited decision-making (e.g., prioritization, screening and hazard identification), where assay data will be applied to these high confidence AOPs to make hazard predictions.Ultimately, this will lay the foundation for more quantitative use of AOPs in support of predictive toxicology and preliminary risk estimates that can be used by risk assessors as the starting point for future chemical risk assessments.

Fig. 1 :
Fig. 1: Conceptual model of a quantitative steatosis Adverse Outcome network Six possible AOPs are initiated by molecular initiating events (MIE, hexagon boxes) leading to the adverse outcome Steatosis.The probability that one event leads to another is represented by an arrow.The probability an event will interfere with or inhibit another event is represented by a line and bar.The probability that any one AOP will result in steatosis is represented by the joint probability distribution across that AOP.Possible crosstalk between different AOPs are revealed in the network.The effect of complex mixtures could be assessed by examining the joint probability distribution across the entire network given the available data.PPARα = Peroxisome Proliferator-Activator Receptor alpha; PPAR= Peroxisome Proliferator-Activator Receptor gamma; FXR = Farnasoid X Receptor; LXR = Liver X Receptor; SHP = Small heterodimer partner; DHB4/HSD17B4 = Hydroxysteroid (17-β) Dehydrogenase 4; NFE2L2/Nrf2= nuclear factor, erythroid 2 like 2.

Fig. 2 :
Fig. 2: Linkage of multiple models to create a mechanistic quantitative Adverse Outcome Pathway model for Aromatase inhibition leading to reproductive dysfunction The Adverse Outcome Pathway (AOP) for Aromatase inhibition leading to reproductive dysfunction is diagrammed on the right of the figure with arrowhead indicating equivalent sections of the mechanistic qAOP.(A) A mechanistic Hypothalamus-Pituitary-Gonad (HPG) model that simulates steroidogenesis (yellow) and conversion of testosterone (T) into estradiol (E2) by aromatase, a negative feedback to the hypothalamus/pituitary where decreasing levels of E2 causes increased synthesis of Luteinizing Hormone (LH; pink), LH binds to the LH receptor in the ovary and stimulates a cAMP cascade (green, purple) resulting in phosphorylation (light green, red) of transcription factor SF1, increased transport of cholesterol into mitochondria by Steroid Acute Response protein and ultimately increased synthesis of E2.I represents a chemical aromatase inhibitor, here fadrozole.(B) Vitellogenin liver compartment model.(C) Model relating Vitellogenin levels to fecundity.(D) Density-dependent population matrix model.

Fig. 3 :
Fig. 3: Quantitative AOP modeling of Key Event and Adverse Outcome responses for Aromatase inhibition leading to reproductive dysfunction in fish in the presence of fadrozole.Calculated responses of (A) plasma estradiol levels and (B) plasma vitellogenin levels in fathead minnow exposed for 8 days (192 h) to fadrozole followed by 8 days of recovery, black line = control, blue line= 30 μg/l fadrozole, red line = 30 μg/l fadrozole.Symbols indicate experimental data points (adapted from Villeneuve et al. 2010), circle = control, red triangle = 3 μg/l fadrozole, Blue box = 30 μg/l fadrozole.Y axis indicates the normalized value relative to control on a log2 scale.The bar below graphs A and B depicts exposure to fadrozole (black bar) and recovery without fadrozole (white bar).Relative population trajectory forecasted for fathead minnow (Pimephales promelas) under exposure of different levels of fadrazole.(A) No fadrozole (B) 0.04 μg/l fadrozole, (C) 0.1 μg/l, (D) 0.2μg/l, (E) 0.5 μg/l, (F) 1μg/l, (G) 2 μg/l, (H) 3 μg/l FAD medium.Relative population becomes stable at lower capacity (78%) for fadrozole exposure of 0.04 μg/l while capacity becomes 40% at 0.5 μg/l of fadrozole exposure.Population becomes zero by 15 and 10 years under fadrozole exposure of 2 and 3 μg/l respectively.The black bar beneath graph C indicates constant exposure to fadrozole.