Bayesian Network Integrated Testing Strategy and beyond

In a recent series of papers written by Jaworska with different coauthors, compelling reasons for adopting a probabilistic approach to Integrated Testing Strategies were detailed. In a case study on skin sensitization, a Bayesian Network proved to be effective in adapting testing strategies to the available evidence. There is no doubt that probabilistic Integrated Testing Strategies are one way to pursue the goals of 3Rs effectively; nevertheless, some issues deserve further comment to pinpoint statistical criticalities and to widen the methodological perspective towards Bayesian graphical models.


Introduction 1
In their seminal book, published in 1959, Russell and Burch lamented the delay in the use of some statistical methods (tests) that "... have probably not been exploited to the full, even in research immediately after their provision."While explaining the importance of the design of experiments, they emphasized that "every time any particle of statistical method is properly used, fewer animals are employed than would otherwise have been necessary."equipped with their words, we make statistical remarks on three recent papers dealing with the integration of testing strategies.
In a recent paper, Jaworska and Hoffmann (2010b) stated that Integrated testing Strategies (ItS) may be considered combinations of test batteries covering relevant mechanistic steps, organized in a logical and hypothesis driven decision scheme, with the aim of providing a comprehensive information basis for making decisions on chemical hazard and risk management.In the same paper, the authors reviewed conceptual requirements for ITS and defined properties that ITS should have to meet the identified requirements.
Among the issues addressed by the authors, the need for context and interpretation to transform data into information, and therefore knowledge, seems to us preeminent for several reasons.the systematic analysis and use of the existing wealth of multifaceted biological data is a requisite for advances in the understanding of life processes and risks, and it appears to be impossible without setting proper contexts.the context deter-

Summary
In a recent series of papers written by Jaworska with different coauthors, compelling reasons for adopting a probabilistic approach to Integrated Testing Strategies were detailed.In a case study on skin sensitization, a Bayesian Network proved to be effective in adapting testing strategies to the available evidence.There is no doubt that probabilistic Integrated Testing Strategies are one way to pursue the goals of 3Rs effectively; nevertheless, some issues deserve further comment to pinpoint statistical criticalities and to widen the methodological perspective towards Bayesian graphical models.a quantitative way is at the core of Bayesian model building (Garthwaite et al., 2005).
The probabilistic approach to ITS seems first to be discussed in Jaworska et al. (2010a), where ItS was criticized for the lack of a principled information processing framework able to incorporate all relevant information while updating uncertainty in a coherent way. the authors built an information-theoretic approach strongly rooted in probabilistic modeling, called the "ItS inference framework," where Bayesian networks were proposed as the software tool to make the ItS inference framework operational.the framework proposed in (Jaworska et al., 2010a) complies with the OeCD ( 2008) recommendations that ItS development should be structured, consistent, transparent, and hypothesis-driven.Many of the above desiderata, if not all, are achieved by using probabilistic analysis and reasoning techniques that find their methodological foundations in the Bayesian paradigm.From the standpoint of applications, besides pursuing hypothesis-driven inferences, the approach supports the assessment of the value of collected information, with the possibility of calculating the expected value of information gain provided by alternative testing procedures.Jaworska et al. (2011) put the above concepts into practice by developing the "Bayesian Network Integrated testing Strategy" (BNItS) to estimate skin sensitization hazard.the proof of concept case proved BNItS to be effective in adapting testing strategies to available evidence while combining in silico, in chemico, and in vitro data related to skin penetration, peptide reactivity, and dendritic cell activation.A key issue highlighted by the authors is that the search for an unlikely gold-substituting in vitro test or best testing strategy should be substituted by the optimal decision in face of the available experimental evidence: this approach was even successful in a case study where missing values (data gaps) amounted to 50% of database records.
the above mentioned papers (Jaworska et al., 2010a(Jaworska et al., , 2011;;Jaworska and Hoffmann, 2010b) provided a wide-ranging account of the role played by probabilistic inference in ItS, masterly framed within the perspective of validation strategies.Overall, we endorse the grand vision depicted as "ItS inference framework," because several of its nice features follow from methodological results proper of the Bayesian field (Robert, 1994;O'Hagan, 1994;Bernardo and Smith, 1994, for comprehensive accounts).Nevertheless, some issues deserve further refinement to unleash the full power of the Bayesian paradigm in ItS and to put the "Bayesian Network Integrated testing Strategy" in perspective.

Bayesian Network Integrated Testing Strategy
the term "Bayesian Networks" (BNs) typically is used to indicate a class of statistical models in which the joint probability distribution of a vector made by discrete random variables is represented as a product of conditional distributions like p(x v ⏐x pa(v) ), where v is a node in the Directed Acyclic Graph (DAG) G defining the network of random variables and pa(v) is the collection of its parent nodes.Figure 2 in Jaworska et al. (2010a) shows a DAG of three nodes, Carcinogenic, T1Ames and T2MLA, where directed edges are: Carcinogenic → T1Ames, Carcinogenic → T2MLA and T1Ames → T2MLA.It follows that the joint distribution of those three variables is decomposed into the product: because the required conditional distributions are straightforwardly read from the DAG (Cowell et al., 1999).
A secondary meaning of BN refers to the software implementation of one model (a specific instance in the class of BNs).Commercial, free, and open source software exist to support the creation and use of BNs through a graphical user interface.Calculations like conditioning and marginalization are exact (without approximations besides those due to floating point computations) and fast (performed by highly optimized algorithms), the so called fast and exact propagation of evidences (FePes).Current software programs to implement BNs, besides FePes, also offer tools to infer the DAG structure and to estimate parameters of conditional distributions using actual observations, two tasks respectively called structural learning and parameter learning.
Probabilistic reasoning with BNs and BN learning are distinct tasks, and they require a quite different degree of statistical expertise because the first one is attainable after limited training without leaving the toxicological context, while BN learning involves far more statistical skills.latent variables, the imputation of missing values, and model averaging over several DAG structures (structural uncertainty) are issues often present in actual applications, as they are in Jaworska et al. (2011).last but not least, records of a database must be exchangeable for being properly processed by common software learning procedures.
Many of the above critical issues are present in (Jaworska et al., 2011), so they are all but immaterial, but the many ways BN learning can fail deserve proper consideration.For the sake of brevity, just a few critical issues are highlighted here.A model with latent variables may suffer from partial identification, thus the elicitation of prior information is highly recommended in such cases.Nevertheless, BN software often allows the specification of limited or no dependence among model parameters during the elicitation of the prior distribution and learning.the numerical optimization of the likelihood function, or other likelihood-related scores, may end to suboptimal points, especially in the huge search spaces of DAG structures.For example, six variables define a set of 3.781.503different DAGs.From this standpoint the statement "... the structure of the BN and the probabilistic relationships between variables were extracted directly from the data" (Jaworska et al., 2011, p. 213, right column) may be perceived as the indication of an automatic and blind learning procedure, a dangerous attitude if wrongly pursued.Missing values and latent variables typically increase the uncertainty about an estimated BN, and the analysis of a dataset ed by sampling from the (unnormalized) posterior distribution of all the unknown quantities (parameters, missing values, and latent variables).the degree of approximation depends mainly on the quality of the sampler and on the sample size; thus, it is largely under the control of the scientist and it may be increased as needed.the availability of open source software to perform MCMC (Spiegelhalter et al., 2003;Plummer, 2003;Stan Development Team, 2013) makes model specification very fast, with templates already available for a large class of standard statistical models.In particular Winbugs (Spiegelhalter et al., 2003) may take a DAG as starting input during model specification.In this framework, FePes is substituted by calculations performed on the predictive distribution of the next (multivariate) observation, given cases already considered during learning.
At the very end, the nice properties of probabilistic reasoning with Bayesian networks neither depend on the discrete nature of variables nor on the existence of FePes. the potential confusion of model properties and features of the available software should be avoided.

Graphs for probabilistic and causal reasoning
Probabilistic and causal models can be represented by graphical models.this point is acknowledged in Jaworska et al. (2010a, p. 161, left column), where the authors stated that "BNs ... are defined as graphical models of probabilistic relationships between variables of interest ..." and a few lines after "BNs can be regarded as decision-support frameworks because of their ability to explain causal relationships and to serve as prediction models."A deeper appreciation of the above two aspects is possible by establishing an explicit connection between them.
Bayesian inference defines how the expert should rationally change his/her beliefs in face of new evidence, whether randomized controlled experiments or observational studies are performed, with the extreme circumstance represented by the design of an experiment where only prior information is exploited.Conditional Independence (CI) (Dawid, 1979) and the Bayes rule are pillars of the probability calculus by which inferential answers are produced.Qualitative reasoning about CI relationships may be performed using DAGs -that is without dwelling on algebraic manipulations of probability distributions but exploiting graph separation theorems (Cowell et al., 1999).Here two remarks are mandatory, the first to emphasize that not all the CI relationships in a distribution can always be represented by a DAG, therefore the need for more general graphical representations follows.the second remark is to make precise that in a DAG of a Bayesian network, two variables Xa and Xb are represented as conditionally independent given Xc if such relationship holds for all possible values c1,c2,…,ck of the conditioning variable Xc. thus only strong CI relationships are explicitly represented by a DAG.
A DAG has to be specified well before the numerical details pertaining to conditional distributions of a BN.Nevertheless, the representation of CI relationships do not cover all the needs in ItS, as clearly stated in Jaworska et al. (2011, p. 222, left after single imputation (called data gaps filling in Jaworska et al. 2011) with estimated values may lead to overstated conclusions due to the single imputation.the uncertainty about structure and parameter values of a BN plays the same role as the uncertainty present in the toxicological problem domain.Full probabilistic coherence is achieved only if all relevant sources of uncertainty are properly taken into account, for example, by avoiding the substitution of unknown quantities through the plug-in of their point estimates, an issue apparently neglected in Jaworska et al. (2011).Sensitivity analysis, as performed in Jaworska et al. (2011, p. 223, left column), is useful to evaluate performances of variants of the DAG (different networks), but in general it does not substitute model averaging over DAGs while performing probabilistic predictions.
the above discussion points towards the conclusion that Bayesian networks, as a probabilistic framework characterized by exact computation with discrete variables, is unnecessarily restrictive and of limited learning abilities, at least in most of the software currently available.Are online fast calculations really needed, as it happens in emergency departments?Is the discretization of continuous variables causing minor loss of information?Are all relevant variables natively discrete?Is the uncertainty about model parameters negligible?Is the net structure (DAG) based on strong prior information or estimated using very large databases?If all the answers to the above questions are "yes," then BNs are likely to be the right tool; otherwise, the adoption of a wider framework, often indicated as Bayesian graphical models (Buntine, 1994), is recommended.this is not a substitution but an extension of the probabilistic approach provided by BNs, which keeps all the key features highlighted in Jaworska et al. (2011, p. 222, left column), such as the ability to deal with uncertainty in biological knowledge, to combine heterogeneous pieces of evidence, and to quantify uncertainty about target and relationships.
Bayesian graphical models include Bayesian networks as specialized instances, those in which the factorization of the joint distribution is determined by a DAG and where variables are discrete.Parameters may be included within the DAG, if they are affected by uncertainty and missing values are considered at the same level as model parameters.there is virtually no limitation on the kind of variables the scientist may use to properly represent his/her beliefs, collected observations, and specific features of a problem domain.FePes often is no longer possible in general, but conditioning and marginalization may still be performed through approximated computations.In Jaworska et al. (2010a, p. 164, left column), the authors made an ambiguous statement because Gaussian Bayesian networks in which all variables are normal admit FePes.Similarly, mixed Bayesian networks made by discrete and Gaussian variables admit FePes if Gaussian variables are never parents of discrete variables (Cowell et al., 1999).
the generality of Bayesian graphical models comes at the price of a higher computational cost.the royal road to Bayesian computation is Monte Carlo simulation, possibly Markov Chain Monte Carlo (MCMC) (Brooks, 1998).The typically difficult integrals required by the application of the Bayes rule are avoid-modularity-stability, that is, the property that intervention produces local changes in manipulated variables, thus leaving all other variables and relationships unchanged.Using the words of Pearl (Pearl, 2009, p. 118) "the new ingredient that causal analysis brings to this tradition is the necessity of obtaining explicit judgments, not about properties of the distributions but about the invariants of a distribution..." The causal interpretation of a DAG or ID has to be justified by substantive reasons, especially if actual intervention studies are not feasible and the graph structure is inferred using observational data.Here the context plays a major role because a causal model is meaningless without a proper defining context.The context includes things like the specification of the protocol-equipment involved in the intervention and the collection of considered variables, those appearing as nodes of a DAG.this is not a trivial choice, because, for example, the collection of considered variables determines the model granularity, that is the level of detail under consideration, and thus two DAGs at different levels of model granularity may show two different sets of direct causes for the same variable.A detailed discussion including covariates in observational studies is provided by Pearl (2000).

Conclusions
For reasons considered in the above sections, the operational framework indicated as Bayesian Network Integrated testing Strategy can and should be broadened to include more general Bayesian graphical models.Most important, DAGs, IDs, and other graphical representations enable toxicologists to reason on important causal and probabilistic model features without resorting to specific model parameterizations or numerical details that typically require extensive statistical training.the intent was not to suggest that there is something wrong with BNs, as we applied BNs in fields as diverse as forensic science (Corradi et al., 2003) and breast cancer biomarkers (Stefanini et al., 2009).By recognizing DAGs and BNs as distinct tools, it becomes natural to consider other useful graphical representations and to emphasize that DAGs are important tools in and of themselves (luciani and Stefanini, 2012, for an example in medical knowledge engineering).Here, it was not possible to cover Bayesian graphical models in full depth, and Bayesian structural learning under sparse prior information on structure seems one among the most important exclusions (Stefanini, 2012).
the hope is to have provided an expanded perspective on BNItS to motivate many toxicologists to seriously consider Bayesian graphical models as a major methodological opportunity to strengthen ITS even further towards the fulfillment of Russell and Burch's 3Rs: Replace, Reduce, and Refine.