Exploration of the GARDskin Applicability Domain: Indirectly Acting Haptens, Hydrophobic Substances and UVCBs *

Hazard assessments of skin sensitizers are increasingly being performed using new approach methodologies (NAMs), with several in chemico, in vitro and most recently also defined approaches (DAs) being accepted for regulatory use. However, keeping track of potential limitations of each method in order to define applicability domains remains a crucial component to ensure adequate predictivity as well as facilitating the appropriate selection of method(s) for each hazard assessment task. The objective of this report is to share test results generated with the GARD™skin assay on chemicals that have traditionally been considered as difficult to test in some of the conventional in vitro and in chemico OECD Test Guidelines for skin sensitization. Such compounds may include, for example, indirectly acting haptens, hydrophobic substances, and substances of unknown variable composition or biological substances (UVCBs). Based on the results of this study, the sensitivity for prediction of skin sensitizing hazard of indirectly acting haptens was 92.4% and 87.5%, when compared with LLNA (n=25) and human data (n=8), respectively. Similarly, the sensitivity for prediction of skin sensitizing hazard of hydrophobic substances was 85.1% and 100%, when compared with LLNA (n=24) and human data (n=9), respectively. Lastly, a case study involving assessment of a set of hydrophobic UVCBs (n=7) resulted in a sensitivity of 100, as compared to available reference data. Thus, it was concluded that these data provide support for the inclusion of such chemistries in the GARD™skin applicability domain, without an increased risk of false negative classifications. The Genomic Allergen Rapid Detection™ (GARD™) assay for assessment of skin sensitizers (GARD™skin) is a NAM-based predictive method addressing KE3 of the AOP. The method is based on test chemical exposure of a surrogate in vitro DC-like cell line, followed by quantification of gene expression patterns of endpoint-specific genomic biomarkers. The quantified levels of transcription of the genomic biomarkers are then used for classification of the test chemical with the aid of a machine learning-based prediction algorithm (Forreryd et al., 2016, Johansson et al., 2011, 2019). The GARD™skin method has been validated and reviewed by the EU Reference Laboratory for alternatives to animal testing (EURL ECVAM) Scientific Advisory Committee (ESAC) (Corsini et al., 2021) and is currently progressing towards adoption as an OECD TG. Therefore, the appropriate monitoring of potential limitations, such as those listed above, is a crucial aspect of defining the method’s applicability domain. This describes retrospective analyses of well as previously unpublished aiming to explore the in the domains of indirectly acting hydrophobic and


Introduction
A chemical substance able to directly or indirectly act as a hapten and induce allergic contact dermatitis (ACD) by the process of skin sensitization is referred to as a skin sensitizer (UN GHS, 2015). The key biological events underlying skin sensitization by small organic molecules have been extensively studied and the existing knowledge has been summarized in the form of an adverse outcome pathway (AOP) (OECD, 2014). Proactive hazard classification and characterization of skin sensitizers is an important aspect of risk assessment of chemicals, a task which is increasingly being performed by the use of so-called new approach methodologies (NAM), including e.g., in silico, in chemico and in vitro methods. Several such methods have gained regulatory acceptance as OECD TGs and provide acceptable data to support classification (OECD, 2015a(OECD, ,b, 2017a and subcategorization (OECD, 2021a) in the context of integrated approaches to testing and assessment (IATA) for several chemical classes.
To further facilitate regulatory uptake in specific industry sectors, and to provide guidance for end-users attempting to select the most appropriate assay for their specific chemistry, careful characterization of the applicability domain of such assays is pivotal in providing confidence in classification outcomes. To date, it is recognized that a number of substances, for various reasons, either remain difficult to accurately assess or belong to a chemical space that has hitherto not been thoroughly explored in the existing validated NAMs (Mehling et al. 2019). Such limitations, as far as they have been identified, are incorporated into the individual TGs, and may include for example hydrophobic substances which cannot be tested at sufficiently high concentrations in submerged cell systems, or indirectly acting haptens that are not inherently protein reactive but would require abiotic or biotic activation to initiate the molecular initiating event of skin sensitization. In addition, while testing of complex or undefined test items such as mixtures, formulations and UVCBs may be technically compatible with test methods, they may require customized alterations of protocols, and the appropriate interpretation of the test outcome may not always be straight-forward.

ALTEX, accepted manuscript
published April 21, 2022 doi:10.14573/altex.2201281 2 The Genomic Allergen Rapid Detection™ (GARD™) assay for assessment of skin sensitizers (GARD™skin) is a NAM-based predictive method addressing KE3 of the AOP. The method is based on test chemical exposure of a surrogate in vitro DC-like cell line, followed by quantification of gene expression patterns of endpoint-specific genomic biomarkers. The quantified levels of transcription of the genomic biomarkers are then used for classification of the test chemical with the aid of a machine learning-based prediction algorithm (Forreryd et al., 2016, Johansson et al., 2011. The GARD™skin method has been validated and reviewed by the EU Reference Laboratory for alternatives to animal testing (EURL ECVAM) Scientific Advisory Committee (ESAC) (Corsini et al., 2021) and is currently progressing towards adoption as an OECD TG. Therefore, the appropriate monitoring of potential limitations, such as those listed above, is a crucial aspect of defining the method's applicability domain.
This report describes retrospective analyses of available GARD™skin data, as well as previously unpublished data, aiming to explore the method applicability in the domains of indirectly acting haptens, hydrophobic substances, and UVCBs.

Selection of chemicals and dataset composition
Available GARD™skin data (Corsini et al., 2021, Forreryd et al., 2016, Johansson et al., 2017, as well as previously unpublished data, were mined for test chemicals identified as indirectly acting haptens in the curated reference dataset compiled by the OECD Expert Group for Defined Approaches for Skin Sensitization (OECD, 2021b). Similarly, the curated reference dataset compiled by the OECD Expert Group for Defined Approaches for Skin Sensitization (OECD, 2021b) was used to identify hydrophobic substances, as defined based on their log P value. Each union set of data was amended with reference classifications based on LLNA and human data extracted from the above-mentioned dataset. Taken together, a total of 25 indirectly acting haptens with available GARD™skin data and reference classifications could be identified. Annotations as indirectly acting haptens, as well as a systematic assignment as either pro-and/or prehaptens, were based on the proposed assignments in the respective above-mentioned publications, when applicable. Similarly, a total of 25 hydrophobic substances with available GARD™skin data and reference classifications could be identified.
The performance of the GARD™skin assay was also evaluated on a subset of complex and hydrophobic UVCB substances. A total of seven test items were kindly provided by The Lubrizol Corporation. Approximate molecular weights were available for all substances, except for LUB-4. All test items had either proprietary in vivo reference data or human data from human repeated insult patch testing (HRIPT) available.
The identities of each test chemical included in the analyses of this report, along with relevant physico-chemical properties, reference classifications and GARD™skin classifications, are listed in Tables 1-3.

Generation of historical GARD™skin data
The available GARD™skin data for all studies in this report were generated according to the validated GARD™skin assay protocol (EURL ECVAM, 2021) and in compliance with the Draft GARDskin OECD TG. While some of the historical data (Forreryd 2016, Johansson et al., 2017 were generated prior to the establishment of the current GARD™skin assay protocol and the drafting of the GARD™skin OECD TG, they were acquired using identical cellular protocols as described in these documents. Of note, however, the data analysis pipeline was updated with the implementation of a Batch Adjustment by Reference Alignment (BARA) pre-processing and normalization procedure , prior to the finalization of the Standard Operating Procedure (SOP) used for method validation. For the purpose of this work, the historical data from the cellular exposure experiments were reanalyzed using the validated and updated data analysis pipeline, in order to generate coherent datasets that were in complete concordance with the GARD™skin assay protocol and the Draft GARD™skin OECD TG.

Generation of novel GARD™skin data
All novel GARD™skin data of studies presented in this report were generated by experiments carried out at SenzaGen's GLP compliant laboratory. When applicable, commercially available test chemicals were purchased from Sigma Aldrich (St. Louis, Missouri). With the exception of the Lubrizol study for testing of UVCB samples, data were collected from experiments conducted according to the GARD™skin assay protocol (EURL ECVAM, 2021) and in compliance with the Draft GARD™skin OECD TG. Data from the Lubrizol study for testing of UVCB substances were generated according to the GARD™skin assay protocol and the GARD™skin draft OECD TG, with two deviations motivated by the solubility properties and complex nature of the test items. Firstly, due to the limited solubility and the complexity of the samples, alternative vehicles, previously not used for method validation, were explored. In addition to DMSO, a mixture of dimethyl formamide (DMF)/glycerol (4/1 (vol%), LUB-2, -5 and -6) and xylene (LUB-4) were utilized in the study based on expert input from Lubrizol, both at a final in-well concentration of 0.1%. The experimental vehicles were included as additional negative controls at corresponding in-well concentrations. Secondly, the complex nature of the test items motivated approximations of the appropriate molecular weights used for calculations of concentrations. Here, weighted mean molecular weights were approximated, taking into account the relative concentrations of each component of the multiconstituent test items. For one test item, LUB-4, no information regarding the molecular weights or relative concentration of components were available. Based on the approximated molecular weights of similar substances in the study, and in a conservative approach to ensure concentration was sufficiently high, the molecular weight for this test item was approximated to 2000 g/mol.
All test results included in the herein summarized studies met the defined acceptance criteria of the method and were based on a minimum of three replicate biological samples, as defined and described (EURL ECVAM, 2021). Values extracted from OECD, 2021b, unless otherwise indicated. Sensitizer and non-sensitizer classifications are denoted 1 and 0, respectively. pre; chemical primarily acting as a pre-hapten. pro; chemical primarily acting as a pro-hapten. pre/pro; chemical acting as both pre-and pro-hapten, or otherwise not able to be subcategorized. MA; chemical primarily reacting as Michaelacceptor. NA; missing value. For additional details regarding GARDskin experimental parameters for each test chemical included in the studies, please refer to Tables S1-S3 1 .

Statistics
The predictive performance of GARDskin in each chemical domain was described by Cooper Statistics (Cooper et al., 1979). As certain test chemicals had been assayed in more than one historical study, the results were based on weighted calculations, hindering individual chemicals with multiple test results from biasing the summarizing statistics. For example, benzyl salicylate (Table 2) had been assayed in four independent GARDskin studies, three of which resulted in a non-sensitizer classification and one resulting in a sensitizer classification. The summarized classification used for calculation of Cooper Statistics was therefore reported as 0.25, where the integer 1 denotes a sensitizer-classification and the integer 0 denotes a non-sensitizer classification. This weighted approach was adopted from the methodology that is used by the OECD when developing test guidelines (OECD, 2015c(OECD, , 2017b and utilized by the ESAC during the peer-review of the GARD™skin method (Corsini et al., 2021).

Results
The applicability domain of GARD™skin was evaluated with a specific focus on indirectly acting haptens and hydrophobic substances by combining historical and novel data generated from each domain. In addition, the GARD™skin functionality in Taken together, it was found that these estimates of performance are in line with previously reported estimates of GARD™skin's accuracy, which are typically in the range of ~80-95%, depending on the chemical subset when considering a wide chemical space. Therefore, these data support the inclusion of indirectly acting haptens and hydrophobic substances in the GARD™skin applicability domain.

Discussion
The field of in vitro toxicology has seen a great surge of innovation in the last decade, allowing NAMs to become a new normal when performing hazard assessment of potential chemical skin sensitizers. Several methods have been validated and adopted into OECD TGs, with the most recent addition of TG 497, providing a guideline for the use of several DAs for skin sensitization (OECD, 2021a). Still, regardless of the regulatory context, or if DAs or stand-alone screening methods are used in nonregulatory settings, the appropriate selection of methods used for assessment of specific test chemicals is highly dependent on an understanding of the applicability domain of each test method. To this end, this report describes the use of the GARD™skin assay for hazard assessment of indirectly acting haptens and hydrophobic substances, both of which are known to be potentially incompatible with individual methods (Mehling et al., 2019). Thus, the major aim of this report was to provide complementary GARD™skin data to support the inclusion of the above-mentioned chemical space in the applicability domain of the GARD™skin method, and to serve as a guidance for end users in the selection of the most appropriate assay for their specific chemistry. Indirectly acting haptens, which are not inherently electrophilic but require either abiotic or biotic activation to gain peptide reactivity, was initially considered to be outside the applicability domain of currently OECD adopted in vitro assays for skin sensitization. However, studies have demonstrated that the majority of such compounds can be accurately detected in at least one of these assays, but with important differences in efficacy between individual assays (Patlewicz et al., 2016, Urbisch et al., 2016. Based on the results presented in this report, the sensitivity of GARD™skin for prediction of skin sensitizing hazard of indirectly acting haptens was 92.4% (23.1/25.0) compared with LLNA references, and 87.5% (7/8) compared with human data. Furthermore, attempts were made to specifically characterize the subgroup of indirectly haptens requiring metabolic activation (pro-haptens), since the metabolic capacity of the in vitro cell system has currently not been fully characterized. The available data included a subset of at least three indirectly acting haptens, which were considered to act exclusively as pro-haptens. One of these chemicals, ethylenediamine, was misclassified in 9 out of 10 studies. Ethylenediamine is assumed to act as a pro-schiff base electrophile by conversion of the amine entity to the corresponding aldehyde. The same mechanism, involving the conversion to the corresponding aldehyde, has also been assumed for the chemical 3dimethylaminopropylamine, which was correctly classified, indicating that the misclassification of ethylenediamine cannot solely be attributed to a general lack in the capacity of detecting chemicals acting via a pro-schiff base reaction mechanism (Patlewicz et al., 2016). Furthermore, it can be assumed that additional pro-haptens were present in the current dataset, but based on their structures, it was not possible to determine if the main pathway of activation required abiotic or biotic activation. In addition, the chemical aniline was misclassified (false negative) in one GARD™skin study. Considering the otherwise concordant classifications with reference data in the chemical space, no immediate explanation for this misclassification has been identified.
Hydrophobic substances have also been considered as challenging to accurately assess in the currently OECD validated in vitro assays, which are largely based on submerged cell-cultures (Mehling et al., 2019, Takenouchi et al., 2013. In this aspect, the major theoretical and regulatory concern is related to the requirement of the test substance to be soluble to a sufficiently high concentration in the aqueous cell media to exceed the limit of detection in the assay, and hence prevent a false negative classification due to testing at a too low concentration. Based on the herein reported results, the sensitivity for detecting hydrophobic substances was 85.1% (20.4/24.0) compared with LLNA references and 100% (9/9) compared with human data, indicating a low risk of false negative classifications because of limitations in solubility of test chemicals. While it is acknowledged that limited substance solubility may indeed be challenging in submerged cell culture systems, the capacity of an assay to detect such compounds is dependent on assay sensitivity, defined as the lowest detectable concentration, and the soluble concentration in the surrounding media, likely resulting in different performances for such compounds for various assays. Considering GARD™skin, recent data suggest that the majority of sensitizers, irrespective of sensitizing potency, are ALTEX, accepted manuscript published April 21, 2022 doi:10.14573/altex.2201281 6 detected at concentrations below 100 µM (Gradin et al., 2021). Thus, if a substance is soluble to this level in the cellular media, GARD™skin is expected to provide accurate predictions, avoiding false negatives. In addition, while not specifically applied in the above-described testing, but as illustrated in the UVCB study discussed below, potential solubility issues may be further mitigated by selection and evaluation of experimental and less polar solvents.
The available data indicate that specificity when compared to human data is notably low, in contrast to the expectation that sensitivity may be lacking when assessing hydrophobic substances. It should be recognized that the subset of expected non-sensitizers is low (n=4), obstructing any decisive conclusions from the herein investigated data. Indeed, additional testing of chemicals with negative reference values may be warranted, in order to further and more accurately estimate specificity, and thereby also the overall accuracy, within the applicability domain. Nonetheless, it is evident that LLNA and human references, both of which were extracted from the reference dataset provided by TG 497, are in conflict for the three false positives, as compared to human references: benzyl benzoate, citronellol and hexyl salicylate. Additionally, they are all considered weak human sensitizers (categories 4-5) according to the human potency categories suggested by Basketter et al. (2014). Thus, these GARDskin data are in line with comparable sources of information, corroborating the inherent borderline nature of the compounds, with different results obtained depending on the considered reference classification. Of important note, however, the sole chemical considered as a non-sensitizer by both LLNA and human data references included in the dataset, n-hexane, is also accurately classified as such by GARDskin.
For most investigated test chemicals, GARDskin classifications are unambiguous and reproducible. However, exceptions do exist. The most notorious borderline classifications include those for test chemicals benzyl benzoate, benzyl cinnamate and benzyl salicylate, which is evident both from individual replicate samples within studies (Fig. S2 1 ) as well as the observed reproducibility between studies (Table 2), indicating the difficulty to reach a conclusive result for these test chemicals also in GARDskin. Additional borderline classifications have been obtained also for other test chemicals; however, they have had little to no impact on the correct final classification, which is based strictly on the mean DV. Furthermore, to the extent where repeated studies are available, also the majority of borderline classifications are reproducible between studies, e.g., 2,5-diamino toluene sulfate, ethylene diamine, geraniol and n-hexane ( Fig. S1-S2 1 ).
Here, the term borderline classification is used to describe a result which is based on individual replicate samples from both sides of the classification threshold (DV=0). It should be stressed, however, that a formal procedure for identification of inconclusive results has not been implemented in the validated and the herein used GARDskin protocols. All figures of reproducibility and predictive capacity, in the herein reported studies as well as in the draft GARDskin TG, have been generated without acknowledging so-called borderline classifications. For future work, however, attempts to further define a borderline range from which conclusive results cannot be obtained may indeed be a relevant adaptation of GARDskin protocols.
In an attempt to put the obtained GARDskin results into context, corresponding data from the three validated in vitro methods included in the TG 497 was extracted (OECD, 2021b). Compiled data for indirectly acting haptens and hydrophobic substances are summarized in Tables S4-S5 1 , respectively. Both datasets are complete, without missing values, allowing for a robust method comparison based on harmonized underlying data. The performances of each method are summarized with confusion matrices and Cooper statistics in Tables S6-S8 1 , corresponding to the GARDskin statistics summarized in Table 4.
For the subset of indirectly acting haptens, the accuracy was for DPRA 56.0% and 50.0%, for KeratinoSens 72.0% and 62.5% and for hCLAT 88.0% and 87.5%, to be compared with GARDskin estimates of 92.4% and 87.5%, using LLNA and human references, respectively. For the subset of hydrophobic substances, the accuracy was for DPRA 48.0% and 69.2%, for KeratinoSens 60.0% and 61.5% and for hCLAT 72.0% and 76.9%, to be compared with GARDskin estimates of 85.7% and 80.0%, using LLNA and human references, respectively. Overall, the summarized results indicate that these chemical subsets may indeed be regarded as difficult to accurately assess in one or several methods. These concerns are well recognized and, when applicable, also specifically addressed in the respective Test Guidelines. For these very reasons, an important conclusion from the herein reported results may be that GARDskin may contribute with important properties of predictive capacity in a wide chemical space where complementary methods may be lacking, highlighting its possibly advantageous use in such contexts.
Lastly, a set of UVCB substances, comprising generally hydrophobic specialty chemicals from the petroleum industry, was investigated. For these substances, the main challenge in terms of in vitro assessment was associated with their limited water solubility, as well as their complex and undefined nature, which in turn was hypothesized to increase the risk of false negative classifications. Therefore, the majority of the test items in this study were indeed sensitizers, and few nonsensitizers were evaluated. However, based on the results from this study, the limited solubility for some of the test items did not appear to have a major impact on the potential of the GARD™skin assay to accurately identify them as skin sensitizers, and no false negative classification was observed in the dataset.
Of important note, while these substances may support GARD™skin functionality in a chemical space similar to that which was investigated, i.e., hydrophobic substances originating from the petroleum industry, this dataset is too small and homogeneous to draw any conclusions regarding GARD™skin applicability in the domain of UVCBs at large. Indeed, the term UVCB may represent a vast chemical space, and care should be taken when considering the selection of an appropriate test method, depending on the specific test chemical chemistry at hand. However, in addition to supporting the GARD™skin applicability in the domain of hydrophobic substances, the examination of this small dataset serves to illustrate two important aspects of the GARD™skin protocol and its adaptability. Firstly, it highlights the opportunity to explore experimental vehicles in order to increase compatibility with otherwise insoluble test items. While complex and highly hydrophobic substances such as the here invested UVCBs may remain partly insoluble, different solvents may be advantageously explored to generate better dispersions, facilitating enhanced transfer of test chemical molecules from the dispersion to the hydrophobic membranes of the cells, thereby enhancing bioavailability. Secondly, the UVCB case study exemplifies a possible protocol adaptation for complex mixtures for which a molecular weight is not defined. Of important note, however, any such adaptations of the method protocol should be scientifically justified, either by estimates of similarity to compatible sources of information, as was done here, or by any other rationale which may be found scientifically appropriate.

ALTEX, accepted manuscript published April 21, 2022 doi:10.14573/altex.2201281
While considered a non-animal method, similar to methods currently included in TGs 442D-E, GARDskin is currently dependent on animal components, including Fetal Calf Serum (FCS) and monoclonal antibodies. However, important work has demonstrated the feasibility of completely animal component-free adaptations of both KeratinoSens (Belot et al., 2017) and h-CLAT (Edwards et al., 2018). For future work, exploring similar adaptations of GARDskin may be a way to further push the boundaries of NAMs, allowing for a sensitive and animal component-free method for assessment also of test chemicals typically considered difficult to test.
In conclusion, based on the herein presented datasets, estimates of GARD™skin's predictive performance when evaluating indirectly acting haptens, as well as substances with limited water solubility, are in line with previously reported estimates for other datasets comprising organic low-molecular-weight chemicals from a wider chemical space. Importantly, reported data indicate that the rate of false negative classifications associated with the investigated chemistries is relatively low, suggesting that negative GARD™skin results for such chemistries can be used for decision making without compromising safety. Thus, available data supports the inclusion of indirectly acting haptens and hydrophobic substances into the applicability domain of the test method.