Formation of Mechanistic Categories and Local Models to Facilitate the Prediction of Toxicity

the toxicity and fate of a chemical can be predicted by a number of in silico methods. these include the use of models, such as quantitative structure-activity relationships (QSARs) built on a large number of compounds to smaller, more discreet models developed on a rational basis, such as a grouping of similar chemicals (van leeuwen et al., 2009). there are no strict definitions or cut-offs, and this range of models can be thought of as a spectrum from global to local. In this sense (and in the context of this paper) global models can be thought of as being developed from data for large numbers of compounds (i.e. hundreds rather than tens), crossing broad structural classes and often mechanisms and modes of action. local models are more likely to be built on smaller numbers of compounds, often with some element of structural and/or mechanistic similarity to them. there are different reasons, advantages and disadvantages for developing and using global and local models. Global QSARs are, by their nature, generalist in that they cover broad chemical space. they may include a number of descriptors, some of them with no direct physico-chemical significance. In addition, they may be formed with non-linear techniques such as neural networks. local models are developed to restrict the domain of the model through careful selection of compounds (see below). they will include fewer data and may simply involve readacross or a simple linear technique such as regression analysis. the distinction between local and global models should not be thought of as a recommendation that one or the other be used. there may be strong arguments in favour of using either or both types of model, depending on the chemical in question, endpoint and context. Global models have the advantage that they are applicable for large numbers of compounds across mechanisms of action and structure. there are many global models for toxicity that can be accessed “off the shelf” either commercially (e.g. expert systems such as tOPKAt and MCASe) or that are freely available (e.g. the CAeSAR models available from www.caesar-project.eu). local models have the advantage of often being more accurate as they are restricted in domain. they may be more transparent and simpler, providing the user with greater confidence in their application. They do, however, normally require manual construction. A discussion Formation of Mechanistic Categories and Local Models to Facilitate the Prediction of Toxicity


Introduction
the toxicity and fate of a chemical can be predicted by a number of in silico methods.these include the use of models, such as quantitative structure-activity relationships (QSARs) built on a large number of compounds to smaller, more discreet models developed on a rational basis, such as a grouping of similar chemicals (van leeuwen et al., 2009).there are no strict definitions or cut-offs, and this range of models can be thought of as a spectrum from global to local.In this sense (and in the context of this paper) global models can be thought of as being developed from data for large numbers of compounds (i.e.hundreds rather than tens), crossing broad structural classes and often mechanisms and modes of action.local models are more likely to be built on smaller numbers of compounds, often with some element of structural and/or mechanistic similarity to them.
there are different reasons, advantages and disadvantages for developing and using global and local models.Global QSARs are, by their nature, generalist in that they cover broad chemical space.they may include a number of descriptors, some of them with no direct physico-chemical significance.In addition, they may be formed with non-linear techniques such as neural networks.local models are developed to restrict the domain of the model through careful selection of compounds (see below).they will include fewer data and may simply involve readacross or a simple linear technique such as regression analysis.
the distinction between local and global models should not be thought of as a recommendation that one or the other be used.there may be strong arguments in favour of using either or both types of model, depending on the chemical in question, endpoint and context.Global models have the advantage that they are applicable for large numbers of compounds across mechanisms of action and structure.there are many global models for toxicity that can be accessed "off the shelf" either commercially (e.g.expert systems such as tOPKAt and M-CASe) or that are freely available (e.g. the CAeSAR models available from www.caesar-project.eu).local models have the advantage of often being more accurate as they are restricted in domain.they may be more transparent and simpler, providing the user with greater confidence in their application.They do, however, normally require manual construction.A discussion -Mechanisms of action may be used to group compounds to- gether.Grouping can be performed on the basis of structural alerts relating to a mechanism of action.For instance, the electrophilic chemistry underpinning respiratory sensitisation has been defined and allows for the basis of category formation (enoch et al., 2009b).
-Compounds can be grouped together with similar receptor me- diated modes of toxic action.this is the most complex grouping methodology, as it may require capturing 3-D structural information of molecules.An interesting recent example of how this may be achieved is given by Aladjov et al. (2009).Details of two of these approaches to form categories, as performed in the CAeSAR european Union FP6 Project, are described below.

Formation of categories on the basis of mechanisms of action for skin sensitisation
Skin sensitisation (or allergic contact dermatitis) is the clinical disease caused by the exposure of the skin to substances that are able to promote an immunological response (such substances are also known as contact allergens).Skin sensitisation is a complex immunotoxicological response, which is often simplified into a number of fundamental steps.The initial exposure to the substance, i.e. induction of contact allergy or sensitisation, is required to induce the immunological procedure.Subsequent challenge of the sensitised individual may result in the elicitation of a response (allergic contact dermatitis or positive test reaction).The dose sufficient for induction is generally larger than the dose sufficient for elicitation (Basketter, 2008).
With regard to making predictions of whether a compound is a sensitiser from a mechanistic standpoint, one must attempt to rationalise the processes that underpin the sensitisation process.Jowsey et al. (2006) rationalised the skin sensitisation process in terms of a number of important processes, namely bioavailability (i.e.skin permeation), protein reactivity, dendritic cell maturation and t-cell proliferation.In terms of making predictions of skin sensitisation, one must hypothesise what is the rate limiting process, without which skin sensitisation will not occur.Of the processes skin permeation, the ability to bind to a relevant protein, the ability to generate a danger signal and the ability of its antigen to be recognised, it can be considered that binding to the relevant protein is a key process (Roberts and Aptula, 2008).Whilst this hypothesis is amenable to description by chemistry and hence convenient for computational approaches, it is as yet unproven, hence the development of a suite of complementary in vitro approaches (Natsch and emter, 2008;Natsch et al., 2009).
the binding of a skin sensitiser to the relevant immunoprotein forms a covalent bond between the molecule and protein.the formation of these bonds is usually through nucleophile-electrophile interactions.Cysteine (-SH) and lysine (-NH2) groups on proteins act as nucleophilic centres, amenable to electrophilic attack by the skin sensitisers (Roberts et al., 2008).the chemistry associated with skin sensitisation can therefore be rational-of global versus local QSAR models is given in more detail by enoch et al. (2008a).
Much has been written about the use of global QSARs to predict toxicity (Bassan and Worth, 2008), and much guidance is available from the european Chemicals Agency.the aim of this study, therefore, was to assess methods to form local QSAR models by the grouping of compounds into categories and to provide illustrations of their strengths and weaknesses.

Methods to form local models through the development of chemical categories
A rational method to group compounds together can result in a "category" being formed.If a category is formed and can be populated with data, then read-across may be attempted (in a quantitative or semi-qualitative) sense.Quantitative models may be built within a category using either quantitative readacross (enoch et al., 2008b) or through the development of local QSARs.these various types of read-across and QSARs built on chemical groupings or category can be thought of as a significant source of local models.they have become more important as predictive toxicology takes on the challenges of issues such as ReACH, especially where data may be sparse, and for the more complex toxicological endpoints such as chronic human health effects (Sakuratani et al., 2008;Johannsen et al., 2008).In addition, as the freely available tools described in this paper (i.e.mechanistic SMARtS strings (enoch et al., 2008c)); OeCD QSAR Application toolbox; toxMatch etc.) become more frequently applied, there will be greater emphasis on understanding and predicting these complex effects.
the theory of grouping compounds together is simple, namely that similar compounds will have similar properties and activities (enoch, 2009).the group of similar compounds is termed a "category" and may also be referred to as a group of analogues.Once formed, if a category can be populated with activity values, e.g.toxicity data, knowledge of activity within a category provides a method for interpolating effects -which is often termed "read-across". the utility and increased acceptance of these methods to group chemicals together is becoming more widespread (Schaafsma et al., 2009;van leeuwen et al., 2009).
there are a number of methods to group "similar" compounds together to form categories (enoch, 2009), and guidance is provided into these areas by the OeCD and european Chemicals Agency.the main areas on which to form categories are: -On the basis of structural analogues and/or congeneric se- ries: It is assumed that compounds sharing the same functional group(s) and varying only in chemical sub-groups, such as alkyl chain length, will have similar mechanisms of action and hence read-across can be performed.there are an increasing number of examples of this approach including Fabjan et al., (2006); Sanderson et al., (2009); Veenstra et al., (2009); Walker and Printup (2008).
-Compounds may be grouped together on the basis of being "structurally similar", i.e. chemical similarity, as defined by algorithms to identify them (Pavan and Worth, 2008).
ity.If compounds are ordered according to that descriptor, then an interpolation can be made by considering the compound with the immediate higher and lower descriptor value.this very simple approach to forming local models was shown to be very powerful.

Formation of categories on the basis of chemical similarity for teratogenicity
The use of a mechanistic profiler or rules assumes the user has some knowledge of chemical structure.A number of other methods can be applied to form chemical groupings or categories.the use of structural similarity may provide insights into groupings or categories without recourse to mechanisms of action.At first sight this may appear to be at odds with the concept of mechanistic transparency to form groupings.However, the assumption is that compounds with a "similar" structure will have similar mechanistic properties, even if those properties are not known (Fabjan et al., 2006;van leeuwen et al., 2009).
A number of algorithms are available to determine the relative similarity of one chemical to another.these "similarity indices" can be calculated by a number of methods and techniques.Generally they provide a number (usually on a scale of 0 to 1 -where 1 indicates identical molecules) relating relative similarity.thus, for a query molecule, a category can be formed around it by selecting the most similar molecules from a database.For an excellent review of this area, the reader is referred to Nikolova and Jaworska (2003).
An example of category formation using structural similarity is provided by enoch et al. (2009a).they analysed results from the FDA/teRIS teratogenicity database (Arena et al., 2004) for 290 chemicals (mainly pharmaceuticals).the dataset had been split into a "training set" (from which categories were sought) and a test set for which read-across predictions were made.Teratogenic classifications, made according to FDA guidelines (Briggs et al., 2002), were available for the "training set".
the study indicated that structural similarity can be used to develop categories on which to base read-across predictions.these categories were transparent and usually associated with a mechanistic basis.However, it should be noted that categories could not be developed for all molecules.Whilst this may at first sight seem like a limitation of this method, it should actually be thought of as a strength, as predictions cannot be made (erroneously) for compounds which are not representative of the data set.

Strengths and limitations of chemical categories and local QSAR and/or read-across approaches
there is no doubt that chemical categories will be formed with increasing regularity to make assessments of toxicity.From that chemical category either qualitative or quantitative read-across may be applied.In certain circumstances, it may also be possible to develop local QSARs.there are a number of advantages to the use of local models and QSARs.ised, i.e. skin sensitisation is known to be related to a number of organic chemistry mechanisms of action (Schultz et al., 2006).Aptula et al., (2005) describe the possibility of six mechanisms of action (SN1, SN2, SNAr, Michael addition, Schiff base formation, acylation) being associated with sensitisation.this information can be rationalised further to provide an organic chemistry mechanistic basis separating different immunological effects (e.g.skin and respiratory information, enoch et al., (2009b)).
Since the chemistry underlying toxicological responses such as skin sensitisation can be rationalised, the types of molecules and structural features associated with the chemistry can be defined.As an example, the structural features associated with the Michael acceptor domain have been defined (Schultz et al., 2007(Schultz et al., , 2009)).this information can be supplemented by tests involving chemical reactivity (in chemico) measurements to assist in the definition of the exact domains (Natsch et al., 2009).If such testing is performed within an "intelligent testing strategy" then the types of structures and effect of substituents, patterns of substitution and steric hindrance on reactivity can be assessed.All such chemical information can be captured computationally through the use of very simple and freely available techniques for describing chemical information.For example, enoch et al. ( 2008c) have described the chemical fragments associated with protein binding using Smiles ARbitary Target Specification (SMARtS) patterns formed into strings.
We therefore have the mechanistic basis and computational techniques to describe and define compounds that may be associated with protein binding.these have been developed into tools to assist the user to form categories (or groupings) of molecules.the SMARtS strings from enoch et al (2008c) are available on request from the author.toxtree (Pavan and Worth, 2008) contains rules to identify compounds as Michael-type acceptors (as defined by Schultz et al., 2007).In addition, and more significantly, as a usable tool since it is linked to databases, the OECD (Q)SAR Application Toolbox contains a profiler for protein binding.All these and other technologies, which are freely available, as well as others, provide the means for the user firstly to profile a chemical to assess whether or not it belongs to one of these mechanisms and secondly to group chemicals together, so that activity may be rationally interpolated within the groupthe so-called process of read-across (Koleva et al., 2008).
Once a category has been formed on a mechanistic basis, it can be populated.From this, local models or QSARs may be produced.this can either be in the form of a QSAR model or can apply quantitative read-across.For example, Patlewicz et al. (2003Patlewicz et al. ( , 2004) brought together a group of compounds that are likely to act as Schiff's bases (e.g., aliphatic and aryl aldehydes).In this case, a two-parameter QSAR was developed incorporating hydrophobicity and electrophilicity descriptors (log P and Taft σ* substituent constant respectively) to predict potency in the local lymph node assay.
In addition, enoch et al. ( 2008b) have demonstrated the applicability of quantitative read-across to predict the potency of skin sensitisers.this is a process whereby once a category has been formed, the activity can be related to an appropriate descriptor, in the case of enoch et al. (2008b) a descriptor of electrophilic--As noted in this paper, there is an increasing availability of tools to develop local QSARs and form categories.Many of these tools are free, e.g.OeCD (Q)SAR Application toolbox; Toxmatch; Toxtree; Analog Identification Method (from the US ePA) etc. the freely available tools can be supported and supplemented by commercial products.
there is increasing acceptance across industry and regula- tory agencies that in silico methods play an important role in providing toxicological information.
-Category approaches and local QSARs are by their nature transparent, i.e. the "algorithm" or grouping strategy is clear and obvious.local QSARs are usually developed from a small number of chemicals with a small number of descriptors (three or fewer) using regression analysis.
-Such methods are mechanistically interpretable, i.e. the mechanism and/or mode of action (if known) can be attributed to the model or grouping and will increase confidence in the prediction.
these grouping methods are easy to develop and describe, although they are not automated.
-Many of these factors make categories and local QSARs easy to characterise and evaluate under the OeCD Principles for the Validation of (Q)SARs.there are also a number of disadvantages: local QSARs and categories will need to be created on a case-by-case basis.Whilst tools and software are available to develop them, they will require expert input for their development.
they are limited by the availability of toxicity data to popu- late the category or chemical grouping.
-Categories may be limited by the tools available to develop them.The profilers within the OECD (Q)SAR Application toolbox are, in many cases, at an initial stage of development.
-For many endpoints and chemicals, the mechanisms of ac- tion may not be known, thus restricting confidence in the category formed.
there is an assumption that a positive prediction from an in silico approach will carry more weight and be more "acceptable" than a negative prediction.It will take a long time for the scientific community to have confidence that a chemical is not associated with a hazard.
there is a lack of guidance and case studies to assist the (novice) user (although the educational material associated with the OeCD (Q)SAR Application toolbox is to be commended).
-At the time of preparation of this paper, it is not yet known if and how the predictions from categories and/or groupings and local QSAR will be accepted by regulatory agencies.

Conclusions
there are a number of approaches to form categories of compounds to allow for the creation of local models and/or QSARs for the prediction of toxicity.this study illustrates the use of mechanistic information based on protein reactivity and chemical similarity indices to develop usable categories for readacross.It is shown that local models are transparent and may provide more accurate results than global models.In addition they have the advantage of being rationalised on a mechanistic basis.there are disadvantages to their use, not least that they are labour-intensive to create and are restricted to specific areas of chemistry.