Integrated Strategy for Mutagenicity Prediction Applied to Food Contact Chemicals

169 Received July 17, 2017; Accepted September 11, 2017; Epub September 18, 2017; doi:10.14573/altex.1707171 constituents into food in amounts that could endanger consumer health and food quality. Regarding migrating substances, a distinction is made between those having a technological function in the manufacturing of the FCM (the intentionally added substances, IAS) and those originating from impurities in raw materials and from the reaction and degradation of substances in the intended use (the so-called non-intentionally added substances, NIAS). IAS are, in general, toxicologically well studied and can be assessed using standard risk assessment. In contrast, for most NIAS no toxicological data are available and risk assessment is therefore not straightforward. Consequently, there are significant uncertainties regarding the safety of NIAS, triggering increasing public, scientific and regulatory concern (Van Bossuyt et al., 2017).


Introduction
Food contact materials (FCM) are materials and articles that are intended to come into contact with food during its production, processing, storage, preparation and serving, before its consumption (EFSA, 2015).Amongst others, these include plastics, paper and board, glass, metal coatings, printing inks and adhesives (Van Bossuyt et al., 2016).
The Framework Regulation (EC) No 1935/2004 includes general requirements for all FCM (EC, 2004), but only few harmonized legislations exist for specific types of FCM, such as the EU Regulation 1282/2011 on plastic materials (EC, 2011).The Framework Regulation states that FCM should be sufficiently chemically inert so that they do not release their over single models (Amaury et al., 2007;Cassano et al., 2014;Kulkarni et al., 2016;Manganelli et al., 2016;Mazzatorta et al., 2007).Recently, EFSA's Scientific Committee has developed a guidance document promoting the use of the WoE in toxicological assessments combining both qualitative and quantitative approaches (EFSA, 2017).The EFSA guidance proposes a strategy for assembling, weighing and integrating different lines of evidence from testing and non-testing methods (NTM) and defines reliability, relevance and consistency in terms of their contribution to the overall assessment.It also provides an example of the use of NTM within a WoE framework, which proposes the integration of a number of (Q)SAR models and read-across for mutagenicity estimations.This example highlights strengths and weaknesses of both methods, which may vary on a case-by-case basis.
Aside from integration, automation allows quick and efficient analysis of chemicals for activity across a battery of in silico methods.This is useful for rapidly evaluating large numbers of chemicals.It is commonly realized with the support of pipeline tools, e.g., KNIME and Pipeline Pilot (Warr et al., 2012).Besides saving time, the automation process also offers the advantage of reducing the inconsistencies and errors due to the manual building, validation and application of in silico methods (Cox et al., 2013;Dixon et al., 2016;Romano, 2008;Zhang et al., 2006).In the case of read-across, the automation of the key steps (e.g., data search methods that use similarity measures and fragment search) allows overcoming one of the main drawbacks of this method, i.e., the lack of reproducibility (Benfenati, 2016;Gini et al., 2014).
In this context, the present work was aimed at drawing up an automated strategy for integrating a number of (Q)SAR models for Ames mutagenicity predictions applicable to large sets of compounds.The dataset compiled by Price et al. (2014), containing a list of substances migrating from plastic FCM isolated from the FACET dataset (Hearty et al., 2011) plus their mutagenic analogues, was selected as a good candidate to develop, validate and test the approach.In our study, these chemicals were assembled and processed by using three in silico (Q)SAR consensus models for mutagenicity.Hence, a scheme to integrate mutagenicity estimations into a single final assessment was defined and applied to toxicologically uncharacterized FCM chemicals.Finally, the overall strategy of integration will be automated through its implementation into a freely available software application.

Chemical structures
For the present work, a list of 183 chemicals obtained from the database identified by Price et al. (2014) was used.It includes substances migrating from plastic FCMs from the FACET dataset (Hearty et al., 2011) and mutagenic structural analogues.Indeed, all compounds used in plastic food packaging go through a rigorous assessment by expert panels, so experimental data, where available, were mainly non-mutagenic.Thus, in order Taking into account the time constraint and the pressure to avoid the use of laboratory animals, the development of alternative methodologies to establish a rapid and cost-efficient level of safety concern of identified NIAS appears critical to ensure adequate consumer protection without undue over-conservatism.In this context, computational toxicology methods are recognized as the most promising solutions and are increasingly applied by academic and regulatory scientists (Benfenati et al., 2009;JRC, 2010).In silico methods have been most prominently promoted by the European Registration, Evaluation, Authorization and Restriction of Chemicals (REACH, EC 1907/2006) regulation on chemicals (Cassano et al., 2014).They are successfully employed for early identification of toxicological hazard in other regulatory frameworks, such as in the qualification of potentially genotoxic impurities in drug substances (ICH, 2017;Sutter et al., 2013) to limit potential carcinogenic risk.
In the food context, the most commonly applied method to establish the level of safety concern of chemicals in the absence of experimental data has been the threshold of toxicological concern (TTC) (Kroes et al., 2004).More recently, the use of computational models has been highlighted (Van Bossuyt et al., 2017;Schilter et al., 2014).The first and common step of these approaches is the identification of a possible structural alert for genotoxicity.Definitely, the genotoxicity and, more specifically, the mutagenicity endpoint has a particular relevance due to the theoretical lack of threshold of effect that this category of chemicals exhibits.For IAS, genotoxicity data are always requested, regardless of the estimated migration level and resulting exposure (EFSA, 2012).In the frame of risk assessment, the hazard identification step considers genotoxicity and mutagenicity via direct DNA reactivity as the default assumption in the absence of sufficient data to the contrary (Schilter et al., 2014;Jacobs et al., 2015).It is commonly accepted that DNA-reactive mutagenic agents do not exhibit a dose below which no effect is anticipated.Even if it is very well known that all mutagens are genotoxic, however, not all genotoxic substances are mutagenic.The bacterial reverse mutation assay (Ames test) is considered a reliable predictor of genotoxic potential (Schilter et al., 2014), and is the most common in vitro test to detect gene mutations (OECD, 1997).Also, mutagenicity is one of the most modelled endpoints due to the quantity and quality of experimental data available.This is the reason why we addressed this specific end-point in our study.The main categories of in silico methods for the prediction of mutagenic potential of chemicals are (Q) SAR models, based on numerical descriptors, rule-based expert systems, making use of structural alerts associated with adverse outcomes, and hybrid models combining both approaches (Benfenati et al., 2016;Mombelli et al., 2016).
The integration of models based on complementary algorithms, i.e., statistical and structure activity relationship (SAR) based, is often a mandatory requirement, e.g., for the genotoxicity qualification of pharmaceutical impurities (ICH, 2017).Up to now, different strategies combining a number of (Q) SAR models for predicting Ames mutagenicity have been proposed, moving towards a weight of evidence (WoE) approach.In general, such combinations resulted in improved predictions experimental values and estimations.For example, 2-isopropyl thioxanthone (ITX) was classified in vitro as experimentally equivocal/mutagenic.Based on in vivo experimental data, ITX is generally considered non-genotoxic.However, non-genotoxic experimental data did not refer to the Ames results, but to in vivo tests.Indeed, EFSA states that ITX induced a borderline increase of revertant colonies in a bacterial reversion test and was inactive in adequate genotoxicity tests in liver and bone marrow (EFSA, 2005).
Assessing borderline substances in more detail, we reclassified two experimentally equivocal compounds, methyl methacrylate (MMA) (EFSA, 2008) and propionic acid (EFSA, 2014), as negative based on registration data reported in the ECHA CHEM database 7 with reliability 1 according to the Klimisch score (Klimisch et al., 1997).
Moreover, analogues were examined to have a positive control and to increase the number of experimental values.ChemIDPlus/Toolbox (OECD, 2017) database matches provided positive mutagenic values for most of the analogues and negative data for two of them; the two non-mutagenic compounds for which all of the Ames assays were conducted according to OECD TG 471 (OECD, 1997) with and without metabolic activation were included in the final list.
Finally, mutagenicity measured values were available for about half of the compounds in the dataset (97 compounds).All details are reported in Table S2 9 .Therefore, we used them to validate the predictions of the three-consensus models moving towards a weight of evidence approach.

Consensus models
All chemicals in the dataset were then processed using the following battery of models: Robust hybrid classifier (RHC), to validate our consensus approach, we also considered mutagenic analogues of the FCM chemicals as a positive control in accordance with Price et al. (2014).Overall, experimental data referring to Ames mutagenicity were available for 97 (29 mutagenic and 68 non-mutagenic) of the 183 chemicals.The remaining 86 substances were toxicologically uncharacterized for this endpoint and the developed approach was applied to the screening of potential mutagenic compounds.
In detail, chemicals' curation was performed as follows: − Name to structure conversion was executed using Marvin View1 / JChem 2 for parent FCM (if available as single substances), migrants and structural analogues.Parent compounds existing as mixtures, oligomers and polymers were also identified and converted into structures (i.e., single constituents, monomers), partially with the help of chemical databases such as ChemSpider 3 , ChemIDplus 4 and PubChem 5 .− SMILES (Weininger et al., 1988) generated by these tools were compared against the original ones from Price et al. (2014).− An in-house software (Floris et al., 2014) was used to identify and then remove duplicates within parents, migrants and analogues.− Canonical SMILES were obtained using the istMolBase software 6 (Kode, 2013), based on the VEGA core libraries and Chemistry Development Kit libraries (CDK) (Benfenati et al., 2015).The final list of 183 chemicals with names and structures, and the related SMILES is reported in Table S1 7 .

Data curation
In the curation process, some of the experimental values were modified and new ones were introduced.Indeed, some gaps emerged from the analysis of further experimental sources and/ or from database and literature updates.The single models in part refer to experimental data which often do not include the complete set of the strains used according to official protocols.However, we believe that combining so many substances (thousands) at the basis of each model, and also the different models together, will cover the data gaps for certain strains on individual substances.Most important when combining different predictions is consistency in the endpoint selection (e.g., not mixing the genotoxicity with mutagenicity endpoints).Since the databases of the models used in this study were built using experimental data from the Ames test, we only considered data for mutagenicity obtained with this assay, even if other in vitro or in vivo tests gave different results for the substances under examination.This is necessary to allow a fair comparison between − ISS-VEGA, which is based on a series of rules defined by Benigni and Bossa detecting mutagenic chemicals originally implemented within the Toxtree application (Benigni et al., 2008;Benigni and Bossa, 2011); − k-NN, which performs a k-nearest neighbors with a weighted integration of the experimental values of the four chemicals most similar to the target (Manganaro et al., 2016).The CAESAR and SARpy VEGA models were developed based on 4,204 chemicals extracted from the Bursi dataset (Hansen et al., 2009).The k-NN VEGA model was built on a dataset of 5,770 chemicals from the Hansen dataset (Hansen et al., 2009) and from data produced within the Ames QSAR project organized by the National Institute of Health Sciences of Japan.The training set of the ISS VEGA model was extracted from the Toxtree software (v 2.6), and consists of 670 compounds.
Predictions from single VEGA models are associated with three possible levels of reliability, based on the definition of their applicability domains: low, moderate and high.The consensus algorithm gives toxicity estimations based on these levels of reliability.It also assigns a numerical score (ranging from 0 to 1) to each estimation, which depends on the number of convergent predictions and on their reliability.If experimental value is provided (because the target molecule has been found in the training/test set of a model) at least by one model, it is kept as final consensus result.
In our evaluation, we considered predictions with a consensus score higher than 0.3 as reliable, else we assigned low reliability to them.Indeed, the cutoff value of 0.3 was able to discard consensus estimations based on the prevalence of predictions associated with low reliability.
We preferred the consensus to the single VEGA models to estimate mutagenicity since its algorithm produces a final assessment influenced by the most reliable individual predictions.

T.E.S.T. consensus model
T.E.S.T.11 estimates Ames mutagenicity using four QSAR methods: the hierarchical method, the Food and Drug Administration (FDA) method, the nearest neighbor method and the consensus method.The consensus method takes an average of the predicted toxicities from the above QSAR methods (taking into account the AD of each method).The dataset of T.E.S.T. mutagenicity models is taken from the Hansen dataset (Hansen et al., 2009).T.E.S.T. provides continuous prediction values to be interpreted as follows: We used this value as an indicator for predictive relevance, assigning the highest uncertainty to predictions equal or close to the 0.5 cutoffs.We considered predicted values greater than VEGA and T.E.S.T. consensus models, each based on the combination of different algorithms.Hence, predictions were evaluated based on information on the applicability domain and reliability of each model, and the related compounds used to build the model.A brief description of the mutagenicity models and of the parameters considered to assign reliabilities to their predictions is provided below.

Robust hybrid classifier (RHC)
The RHC model, developed by Mazzatorta et al. (2007), integrates (i) the Structural Alerts model (SAm), including the list of improved structural alerts (SA) gathered by Kazius et al. (2005), and (ii) the Artificial Intelligence model (AIm), which is a modified k-nearest neighbor based on the LAZAR system developed by Helma (2004) (Mazzatorta et al., 2007).The training set of 4,337 substances used for building RHC was collected by Kazius et al. (2005), and the test set of 753 chemicals used for its validation was assembled and curated by Young et al. (2002), as described in detail by Mazzatorta et al. (2007).
RHC returns the Ames prediction together with a confidence level, which depends on the ratio between the number of mutagens containing a given toxicophore and the total number of compounds in the test set with that moiety and takes into account the error associated with the prediction of each SA (Mazzatorta et al., 2007).If both models predict the compound as non-mutagenic, RHC considers it negative with a confidence equal to 0.85, which refers to the overall specificity of the system; if there is a convergence regarding the mutagenicity, RHC considers it as mutagenic and the confidence is equal to the sensitivity of RHC weighted by the product of the individual error associated with the SAs present in the compound.In case of non-consensus prediction, SAm prevails, because it is based on well-documented experimental evidence and has a superior accuracy, but the confidence of the prediction is accordingly lowered.
Based on criteria chosen by Mazzatorta et al. (2007) to define different levels of confidence, 0.65 was chosen as cutoff value for prediction reliability; estimations with a confidence level greater than or equal to 0.65 were considered reliable, otherwise they were associated with low reliability.The model does not indicate if the predicted chemical is included in its training/test sets.

VEGA consensus model
The VEGA consensus model10 integrates predictions from the following (Q)SAR models: − CAESAR, which integrates a support vector machine (SVM) algorithm coupled with two sets of structural alerts aimed to reduce the false negative rate (Ferrari and Gini, 2010); − SARpy (SAR in python), which extracts a set of structural alerts related to a specific activity from data without any a priori knowledge (Ferrari et al., 2013); models.All models correctly classified 75 out of 97 experimentally known chemicals, 19 as mutagenic and 56 as non-mutagenic.However, 17 out of the 19 mutagenic ones were included in the training/test sets of at least one model; 22 chemicals were not correctly predicted by at least one model.Table S2 9 contains the list of 183 chemicals with predicted values from the three models with their "reliability/confidence scores" and experimental values.RHC generated nine false negatives; one was in common with T.E.S.T.However, these inaccurate predictions were overridden by the correct ones from the VEGA consensus model based on the presence of experimental values.T.E.S.T. produced two false negatives (one in common with RHC), which were overridden by positive predictions of VEGA, which contains the experimental data.
Overall, the models gave 11 false positives: four were generated by VEGA, two by T.E.S.T., and seven were misclassified by RHC, all but one with high uncertainty.Three of these chemicals were alkyl phenyl sulfonates, suggesting that the RHC model may encounter problems when predicting this chemical class.
Based on the available measured mutagenicity values, we examined the possibility to combine and validate the predictions of the three-consensus models moving towards a weight of evidence approach.We drew up a strategy to combine predictions from the individual consensus models.Essentially this integration scheme first checks the presence of experimental data and then the prediction reliabilities for each model.
The algorithm we developed involves the following steps: In this case, if the estimations from the other models are convergent and at least one of them is reliable, this is kept as final assessment; otherwise the integrated model does not provide any prediction.If the other two predictions are divergent, the one with high reliability is taken.(d) Two models are unable to predict the molecule.In this case, the only available estimation is kept as final assessment only if it is reliable; otherwise it is discarded.Table 1 lists the statistical performance of the three consensus models plus the combined one.We calculated statistics on curated experimental data with and without information about reliability of prediction.0.7 as mutagenic with high reliability and lower than 0.3 as non-mutagenic with high reliability.Prediction values between 0.3 and 0.7 were considered uncertain.In this case, if the experimental value was present in the model's dataset, it superseded the predicted value in the final assessment.
We evaluated the results obtained by in silico predictions based on the information on the applicability domain and the uncertainty provided by the models.

Algorithms for evaluation of classification models
The performance of the three consensus models was evaluated using Cooper's parameters (Cooper et al., 1979), which include accuracy, sensitivity and specificity.These parameters take into account the number of correctly classified mutagens (true positive = TP) and non-mutagens (true negative = TN) and the number of misclassified mutagenic (false positive = FP) and non-mutagenic (false negative = FN) compounds.Matthew's Correlation Coefficient (MCC) was also assessed.
These are calculated as follows: Accuracy (concordance or "Q") measures the total errors, while models with high sensitivity produce fewer false negatives, i.e., mutagenic compounds that are predicted as non-mutagenic.Models with high specificity give fewer false positives (non-mutagenic chemicals incorrectly predicted as mutagens).
The Matthews Correlation Coefficient (MCC) evaluates the quality of binary classifications and is generally considered a balanced measure, which can be used even for classes of very different sizes.(Matthews et al., 1975).This parameter prevails over any imbalance in the data classes, which may lead to unfair values of accuracy.It is calculated as follows: MCC values vary between -1 and +1: +1 indicates exact classification, -1 results from complete misclassification and 0 implies a random result.

Results
Overall, the three consensus models gave convergent predictions for 144 out of 183 compounds, corresponding to 79% of the total dataset: 21 were mutagenic and 123 were non-mutagenic.
First, we compared predicted and experimental values where available and evaluated the statistical performance of the three gle prediction taking into account the information on reliability.In addition, measured data available in VEGA and T.E.S.T. training/test sets filled the gap of experimental information from the RHC model within this integrated scheme.The combined model was unable to assess two chemicals, bis(2,6-diisopropylphenyl)carbodiimide and 2,2,3-trifluoro-3-(trifluoromethyl) oxirane, reported as negative in the ECHA CHEM database 7 All the models showed good statistical performance.The use of information about reliability based on the selected cutoffs led to enhancement of statistical parameters.Moreover, data curation allowed fixing a number of misclassifications of the used models.
Besides the statistical improvement, the use of the integrated strategy provided higher prediction coverage compared to sin-   First, the performance of the individual predictive models was affected by the quality of experimental data and by the information on prediction uncertainty, where available.The information about prediction reliability improved all the statistical parameters.Moreover, a fair comparison between measured and estimated values is not a simple matter.Indeed, it is important to identify the experimental protocol used to measure or estimate the endpoint that is being examined, in accordance with OECD principles (OECD, 2014).This example illustrates that the use of data curation gives a more objective estimation of actual predictive power of the models, often accompanied by an improvement of their statistical behavior.
It is increasingly recommended to combine models based on complementary algorithms (ICH, 2017).This is often considered a default option to minimize the risk of producing false negatives and therefore to ensure optimal consumer protection.However, this may potentially be at the expense of generating numerous false positive predictions and reducing overall accuracy.This could potentially result in over-conservative and non-discriminative predictions, preventing their most efficient use for decision-making.
In the present study, enhanced (Q)SAR model performance was observed by applying a new algorithm for model integration, taking into account the different reliabilities of orthogonal methods.A possible drawback of using a highly accurate model may be a loss of chemical structure coverage.The use of the combination approach developed in this study together with the use of the available information on applicability domain/ reliability provided a higher prediction coverage compared to single model estimations.Indeed, the aim of the paper was to study the strengths of a new consensus model based on easy rules, taking into account the convergence and the reliability of predictions.In this way, our strategy relates more strongly to the strength of the more reliable models.It shows that most Ames mutagenicity (Q)SAR models already perform quite well (the error of the models is very close to the experimental one) and the benefit of our approach is mainly represented in the increase of the applicability domain.As a first application, the consensus model was applied to a limited number of chemicals.In the near future we are planning to test it on bigger datasets and to include other kinds of applications.
Finally, the integrated strategy was applied to 86 chemicals.All of them could be predicted.Three were considered genotoxic (reported in Tab. 2) and were analyzed more deeply.The results obtained allow further assessment of the safety of these toxicologically untested molecules through the application of the TTC approach (Kroes et al., 2004) and/or the conceptual scheme developed by Schilter et al. (2014).

Conclusions
In the framework of understanding and managing risks for consumer health posed by untested food contact chemicals such as NIAS, the present study provides an algorithm combining existing models for a time-and cost-efficient evaluation of Ames with reliability 1 according to the Klimisch score (Klimisch et al., 1997).In the framework of developing a strategy of toxicity assessment for food contact chemicals, experimental data supersedes predicted data.
The high level of accuracy of the integrated strategy provided a rationale to apply it to evaluate the remaining 86 experimentally untested compounds in our dataset to identify mutagenic chemicals.The new integrated model gave 83 non-mutagenic and three mutagenic predictions.Nine out of the 83 substances predicted as non-mutagenic were part of the training/test sets of the model(s).The chemicals predicted positive were not included in the models' databases (training and/or test sets) and were among the possible migrating substances.These three positively classified food contact chemicals lacking experimental information are shown in Table 2.All of them contain structural alerts, which have been associated with mutagenic activity based on mechanism of toxicity (Benigni, 2008) or on statistical evidence (Benfenati et al., 2015).In particular, these include thioxantones, which are present in 2,4-diethyl-9H-thioxanthen-9-one and 4-isopropylthioxanthone (4-ITX), and an α,β unsaturated carbonyl moiety, which occurs in 5-chloro-2-methyl-2H-isothiazol-3-one (CIT).
Both 4-ITX and 2,4-diethyl-9H-thioxanthen-9-one are 2-ITX structural analogues (contained in our dataset).Consequently, the reasoning on mutagenicity for 2-ITX can be extended to these chemicals through a read-across approach, because of the high structural similarity and the presence of the thioxantone ring as structural alert shared by the three molecules.As in the case of 2-ITX, the two related chemicals might exhibit their mutagenic potential in vitro but not in vivo (as explained in Section 2.2).
CIT is a component of a biocide with CAS number 55965-84-9, mixture 3:1 with 2-methyl-2H-isothiazol-3-one (MIT).According to EFSA Scientific Opinion (EFSA, 2010), the biocide gave positive results in genotoxicity tests in vitro in bacteria, while no significant genotoxicity was observed in vivo.The other component of the mixture, MIT, was predicted as non-mutagenic by the combined model.Based on these estimations, CIT might be considered as responsible for positive in vitro results of the CIT/MIT mixture.

Discussion
In this study, an integrated strategy for mutagenicity prediction was developed and validated on about a hundred experimentally known chemicals, including mostly non-mutagenic migrating substances from FCM plus their positive analogues.Even if our aim is to integrate QSAR and read-across in the frame of the WoE approach, in the present study, we focus only on the integration of QSAR models because read-across is quite difficult to automate.Comparing the results obtained by the QSAR consensus model and read-across approach can surely increase the accuracy of the final prediction.Such a procedure is strongly recommended for compounds predicted with low reliability.This study highlighted some other key aspects to take into account in the evaluation of predictions from in silico models.mutagenicity.The integration scheme resulted in an increased domain of applicability.Moreover, we are planning to test the model in the near future on a bigger number of chemicals and including other kinds of applications (not only food contact chemicals).These results will improve the implementation of a tool, such as VEGA, to integrate predictions from different models.Indeed, we believe that such a strategy may be applied as the first step of a more complex screening strategy aiming to establish the level of safety concern of experimentally untested substances according to the broadly accepted TTC concept (Kroes et al., 2004) and/or other more recently developed non-testing approaches (Schilter et al., 2014) aiming at identifying a dose to be compared with estimated level of exposure within a Margin of Exposure (MoE) approach.Each step of the overall scheme will be sequentially automated through implementation in the publicly available VEGA platform in the near future.This will not only provide time-saving, but also the advantage of minimizing inconsistencies and errors due to the manual building, validation and application of in silico methods.
1.If an experimental value is present in the dataset(s) of at least one model, it takes the place of the predicted one(s) in the final assessment.2. If there is no experimental value, processing a molecule by the three models gives rise to different possibilities: (a) All estimations are convergent (all mutagenicity positives or negatives); in this case, these become the final prediction, regardless of their reliability.(b) Two predictions are convergent (both associated with mutagenicity/non-mutagenicity) and one is divergent; the convergent estimations are used as final prediction if at least one of them is reliable, otherwise the divergent one supersedes them if it is reliable.If both convergent and divergent estimations are uncertain, they are discarded and the model is unable to estimate the molecule.(c) One of the three models cannot provide any prediction.

Tab. 1 :
Statistical performance of VEGA, RHC and T.E.S.T models for mutagenicity evaluated using accuracy, sensitivity, specificity and the Matthew's Correlation Coefficient (MCC)These parameters show the number of correctly classified mutagens (true positive = TP) and non-mutagens (true negative = TN) and the number of misclassified mutagenic (false positive = FP) and non-mutagenic (false negative= FN) compounds.