Evaluation of the GARD Assay in a Blind Cosmetics Europe Study

515 Received January 12, 2017; Accepted February 14, 2017; Epub February 17, 2017; doi:10.14573/altex.1701121 A link has been made between the prevalence of ACD and the increased exposure of the population to the abundance of chemical sensitizers in consumer products (Lunder and Kansky, 2000; Nguyen et al., 2008). In order to limit hazardous effects of chemicals, risk assessments aim at safeguarding humans and the environment by eliminating and mitigating risks of exposure. The European REACH (EU, 2006) legislation requires all manufactured substances to undergo safety testing in order to identify, e.g., chemical sensitizers. Historically, such tests have been conducted in guinea pig (Magnusson and Kligman, 1969) and murine (Basketter et al., 2002) models. Mainly, the murine Local Lymph Node Assay (LLNA) continues to be used today. However, the use of animals for testing cosmetic ingredients has been banned in the EU since 2013 (EU, 2009), and the REACH legislation urges other industries to use animal testing only as a last resort when no relevant alternative testing methods exists, thereby clearly stating an intent to comply with the 3R principles (Russel and Burch, 1959).


Introduction
Chemical hypersensitivity is a disease state induced by the human immune system in response to chemical sensitizers, which most frequently gives rise to the clinical symptoms of allergic contact dermatitis (ACD).The molecular and cellular mechanisms of sensitization have been reviewed extensively (Ainscough et al., 2013;Martin, 2015;Martin et al., 2011).Briefly, sensitization involves skin penetration of the sensitizing agent with a subsequent haptenization of endogenous proteins.Protein-hapten complexes are taken up by resident dendritic cells (DCs), which upon maturation migrate to local lymph nodes where antigen presentation to naïve T cells occurs.This results in the induction of an immunologic memory towards the specific sensitizer.Upon repeated exposure, a sensitized individual will suffer from ACD-associated symp-of reliability, resource effectiveness and sample capacity (Forreryd et al., 2016).Furthermore, an adaptation of GARD using identical cellular protocols but a different biomarker signature to differentially classify respiratory sensitizers from a set of skin sensitizers and non-sensitizers has been demonstrated (Forreryd et al., 2015).This illustrates the unparalleled flexibility of applications of genomics-based platforms, which is due to the massive amount of information that multivariate readouts deliver.
In an attempt to evaluate the performance of currently validated assays, as well as selected assays that are currently in the validation process or are being considered for validation, the Cosmetics Europe Skin Tolerance Task Force (CE STTF) recently published a comparative study in which a limited set of chemicals were classified as sensitizers or non-sensitizers (Reisinger et al., 2015).Based on this study, the best-performing assays, among them GARD, were selected for a second evaluation phase comprising a larger number of blinded chemicals with human and LLNA data.Here, we report the predictive performance of GARD on this Cosmetics Europe dataset as well as an updated overall predictive accuracy of the assay, calculated using strictly independent sets of test chemicals.

Chemicals and datasets
A dataset for model training, consisting of 40 different cell stimulations in biological triplicates, was defined previously and the respective dataset details are described elsewhere (Johansson et al., 2011;Forreryd et al., 2016).In this study, a total of 73 chemicals (see Tab. 1 for details) were assayed blindly using the above-mentioned training data set.All chemicals were provided by the CE STTF, which also kept the code for the blinded chemicals.All chemicals were stored according to the suppliers' recommendations.In addition to the blinded chemicals of the test set, a set of non-blind benchmark controls (see Tab. S1 at doi:10.14573/altex.1701121sfor details) were included.The purpose of the benchmark controls was to calibrate the prediction model to the current batch of cells, as described (Forreryd et al., 2016).All chemicals used as benchmark controls were purchased from Sigma Aldrich (St. Louis, MO, USA) and were stored according to the manufacturer's instructions.
As a consequence, the field of predictive toxicology has recently seen a surge in the development of novel non-animal assays for the assessment of chemical sensitization potential.The Direct Peptide Reactivity Assay (DPRA) (Gerberick et al., 2004), KeratinoSens™ (Natsch, 2010) and the human Cell Line Activation Test (h-CLAT) (Ashikaga et al., 2006) have been validated by the European Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM) and have recently been accepted by the OECD as test guidelines, which demonstrates that these tests are adequately reproducible and transferable (DPRA, OECD TG 442C; Keratinosens, OECD TG 442D; h-CLAT, OECD TG 442E).
However, none of the aforementioned assays are thought to fully cover the complexity of the skin sensitization process as stand-alone tests.Rather, it is widely proposed that assessment of hazard and/or risk should be carried out using integrated testing strategies (ITS), also referred to as integrated approaches to testing and assessment (IATA) (Jaworska and Hoffmann, 2010;Hartung et al., 2013;Rovida et al., 2015;Ezendam et al., 2016).However, the overall predictive performance of an ITS will invariably depend on the predictivity of its assay constituents.In addition, being based on a single or a few biomarkers, current methods provide only limited predictive information, as well as sometimes overlapping mechanistic information.Thus, when designing an ITS, tests with high predictive performance and information content, covering one or more of the key events of the adverse outcome pathway (AOP) (OECD, 2012), would clearly be an advantageous option (Lindstedt and Borrebaeck, 2011).
The Genomic Allergen Rapid Detection (GARD) assay is a cell-based in vitro assay for assessment of chemical sensitizers (Johansson et al., 2011).The readout of the assay is based on differentially regulated transcriptional changes of selected genomic biomarkers, referred to as the GARD prediction signature (GPS), induced in a myeloid dendritic cell-like cell line in response to chemical stimulation.GARD has been shown to be functional and able to accurately predict sensitizing chemicals in blind evaluations (Johansson et al., 2014) and exhibits high predictive performance in comparison with in vitro counterparts (Johansson and Lindstedt, 2014).Following a thorough evaluation of technological platforms (Forreryd et al., 2014), the assay was recently adapted to a medium-to-high throughput format in order to meet industrial and regulatory demands

Chemical identifiers References Assay parameters GARD output
Substance ID CAS LLNA HP GHS/ vehicle c.max c.rv90 c.input cDV (± SD) Prediction CLP bated at 65°C for 24 h.Hybridized samples were processed in the NanoString GEN2 nCounter Prep Station 5s, using the High Sensitivity protocol, and analyzed in the NanoString Digital Analyzer 5s for digital quantification of each transcript of the GPS, using maximal resolution (555 fields of view).All required equipment, CodeSet and master kit reagents were obtained from NanoString Technologies (NanoString Technologies, Seattle, WA, USA).

Data pre-processing, normalization and analysis
Raw nCounter gene expression data was imported into the R statistical environment (R Development Core Team, 2014), in which all downstream analysis was performed.Data was normalized using a counts per total counts (CPTC) algorithm, which reports normalized values for any given gene of the GPS as the ratio of digital counts for the specific gene and the total counts of all measured genes within that sample.Generation of prediction calls for each sample (sensitizer/non-sensitizer) was performed as described previously.Briefly, a support vector machine (SVM) (Cortes and Vapnik, 1995) was trained on the training dataset and used to generate decision values (DVs) for each sample of the benchmark control dataset and test dataset, respectively.The predictive performance of the model was evaluated on the benchmark control dataset using the additional R package ROCR (Sing et al., 2005).Observations of the receiver operating characteristic (ROC) (Lasko et al., 2005) allowed the identification of the prediction model cutoff that achieves the highest accuracy of predictions of the benchmark control dataset, which was subsequently subtracted from all DVs generated Cell maintenance, chemical stimulations, phenotypic analysis and total RNA isolation All GARD protocols for cell maintenance, cellular stimulation with chemicals, required phenotypical quality control of cells prior to chemical stimulation, and isolation of total RNA have been described previously (Johansson et al., 2013(Johansson et al., , 2011;;Forreryd et al., 2016) and were followed without deviation in this study.The myeloid cell line used in this study was derived from MUTZ-3 (DSMZ, Braunschweig, Germany) and is available via SenzaGen AB (SenzaGen AB, Lund, Sweden).All cellular stimulations were performed in biological triplicates, using separate cell batches for each replicate.Following chemical stimulation, cells were harvested and lysed with TRizol reagent (Thermo Scientific, Waltham, MA), and stored at -20°C until RNA extraction.Total RNA was isolated from lysed samples using Direct-zol™ RNA MiniPrep column purification kit (Zymo Research, Irvine, CA, USA) according to protocols provided by the manufacturer.Total RNA concentrations and RNA integrity were assessed using the Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA).Total RNA was stored at -80°C until NanoString nCounter analysis.

Gene expression analysis using NanoString technology
The design of a custom NanoString CodeSet, corresponding to the GARD prediction signature (GPS), was described recently (Forreryd et al., 2016).All NanoString-associated protocols for gene expression analysis were performed according to instructions by the manufacturer.In short, the custom CodeSet was hybridized with 100 ng total RNA (5 µl at 20 ng/µl) and incu-

Fig. 1: GARD predictions correlate with potency classifications
Box-and-whisker plots of mean GARD cDVs, grouped by sensitizing potency as defined by the GHS/CLP classification system.Only test substances for which such classifications are available are included, see Table 1 (n chemicals = 52).The color of each data point is mapped to the GARD input concentration (µM) used for that test substance.
CLP system.The observed differences in mean cDVs indicate that the GARD predictions correlate with potency classifications.

Accumulated GARD performance parameters across historical datasets
In order to relate the current results to previously published figures of predictive performance, an update of accumulated Cooper statistics for independent GARD assessments across various datasets is presented in Table 3. Combining datasets from a total of 127 chemicals, the accuracy of GARD was calculated to be 86%.

Discussion
In the last decade, substantial efforts have been made to develop and validate alternative non-animal assays for the assessment of chemical sensitizers in order to meet changing regulatory and industrial demands.The current leading opinion is that no single assay is likely to provide sufficient information for accurate safety assessment of chemicals as a stand-alone test.This notion is supported by the data generated by currently validated tests and the subsequent recommendations given by EURL ECVAM (EC, 2013(EC, , 2014(EC, , 2015)).For this reason, it is of great importance to continuously compare and evaluate novel and already established test methods using coherent reference chemical panels in order to prioritize assays that display superior functionality and predictivity when designing IATAs, or in the quest for stand-alone tests.
In this report, we present novel data regarding the functionality and predictive performance of GARD, generated in a blind study performed in association with the CE STTF.In this independent dataset, GARD accurately classified 83% out of a total of 72 chemicals for skin sensitization hazard.Adding this figure to previously published data from independent evaluation studies, GARD displays an accumulated accuracy of 86%, based on the classification of a total of 127 chemicals.
It is appropriate at this point to consider the gold standard of sensitization assessment, i.e., the reference against which such performance estimations are calculated.In this report, comparisons have been made with both LLNA classifications and human potency (HP), as defined by Basketter et al. (2014).The concordance of GARD with these metrics was 76% and 81%, respectively.Of note, the concordance between LLNA and from samples of the test dataset.Thus, final predictions were performed on calibrated DVs (cDVs).A specific chemical used for stimulation was classified as a sensitizer if the mean cDV from biological triplicates was greater than zero.The predictive performance of the model's classifications of the test dataset was assessed using Cooper statistics (Cooper et al., 1979).

GARD classifications of the blinded CE-reference panel of chemicals
A set of blinded chemicals was classified as sensitizers or non-sensitizers by the GARD assay using established protocols.GARD predictions of the chemicals used in this study are presented in Table 1.Calculations of various predictive performance parameters based on Cooper statistics are presented in Table 2.For the purpose of binary predictions, a composite reference was defined to classify a sensitizer as a compound that is categorized as having a human potency (HP) (Basketter et al., 2014) of 1-4, or being categorized as HP 5, if it is also predicted as a sensitizer by the LLNA.Consequently, compounds categorized as HP 5, predicted as non-sensitizers by the LLNA, are here defined as non-sensitizers, together with all compounds of HP 6.This binary classification system perfectly correlates with the Global Harmonization System (GHS) / Classification for Labelling and Packaging (CLP) classifications.By this definition, based on the current data, the accuracy, specificity and sensitivity of GARD, is 83%, 56% and 93%, respectively.Comparing GARD predictions strictly with either HP or LLNA, the concordance was estimated to be 81% and 76%, respectively.The mean magnitudes of the cDVs are visualized in box-and-whisker plots in Figure 1, grouped according to their sensitizing potency as defined by the GHS/ sensitizing capacity of Tween 80 has been closely examined and confirmed to be evident both before and after oxidation (Bergh et al., 1997).Consequently, the inherent difficulty of accurately assessing these compounds should rather be regarded as general.Naturally, these aspects were a contributing factor to including such compounds in the blinded dataset used in this study, likely skewing the estimated specificity within the dataset towards lower figures compared to what would be expected in broader chemical domains.
During GARD development, it was observed that the relative magnitude of the GARD decision values correlates with sensitizing potency (Johansson et al., 2011), a hypothesis that has been maintained since.In light of the above discussed ambiguities regarding sensitizing potency, as estimated by current gold standards, GARD development towards potency assessment focuses on the distinction between strong and weak sensitizers in accordance with the GHS/CLP classification system.In Figure 1, the cDVs of the test substances are grouped according to this system.From the current data it is clear that the hypothesis based on earlier observations prevails, since strong sensitizers (1A) on average generate higher DVs compared to weak sitizers (1B).Furthermore, it is evident that the cytotoxicity of a chemical is also related to its sensitizing potency.In current GARD protocols, cytotoxic compounds are used at concentrations that maintain 90% relative cell viability.From Figure 1, it is evident that strong sensitizers (1A) are on average assayed at lower concentrations compared to weak sensitizers (1B), due to their higher levels of cytotoxic effects.While the GARD platform indeed holds information regarding sensitizing potency, there is an overlap between the different categories, which presently hampers its utilization for accurate potency assessment.However, the harnessing of accurate potency information is currently being refined for accurate sub-categorization (manuscript in preparation).
In conclusion, we here report data of GARD performance on an extended, blinded set of chemicals.Taken together, GARD is consistently functional across datasets, with a predictive accuracy of 83% in this Cosmetics Europe dataset and average predictive accuracy of 86% in a combined dataset of 127 chemicals for skin sensitization hazard.