Developmental Neurotoxicity Testing: Recommendations for Developing Alternative Methods for the Screening and Prioritization of Chemicals

9 Developmental Neurotoxicity Testing: Recommendations for Developing Alternative Methods for the Screening and Prioritization of Chemicals Kevin M. Crofton1, William R. Mundy1, Pamela J. Lein2, Anna Bal-Price3, Sandra Coecke3, Andrea E. M. Seiler4, Holger Knaut3*, Leonora Buzanska5 and Alan Goldberg6 National Health and environmental effects Research laboratory, U.S. environmental Protection Agency, Research triangle Park, NC, USA; UC Davis School of Veterinary Medicine, University of California, Davis, CA, USA; In-Vitro Methods Unit, european Centre for the Validation of Alternative Methods (eCVAM), Institute for Health and Consumer Protection, european Commission Joint Research Centre, Ispra, Italy; Center for Alternative Methods to Animal experiments – ZeBet, Federal Institute for Risk Assessment (BfR), Berlin, Germany; Mossakowski Medical Research Centre, Polish Academy of Sciences, Warsaw, Poland; Center for Alternatives to Animal testing, Johns Hopkins University, Baltimore, MD, USA


Introduction
the development of alternative test methods has been widely acknowledged as a critical need for toxicity testing (NRC, 2007). Two major issues are driving this need. The first is the need to provide more efficient testing methods that can provide hazard information for the thousands of untested chemicals currently used in commerce (Andersen and Krewski, 2009;Kavlock et al., 2009). the second is a need to reduce the use of animals in toxicity testing (Goldberg, 2002;Balls, 2009).
Over the past two decades, expert panels have provided criteria and guidelines for validating new testing methods. the most prominent of these are the ICCVAM guidelines on validation of alternative test methods (ICCVAM, 1997). these guidelines provide a framework for assessing the applicability domain and performance criteria of new methods, which are especially critical if the method is to be used in a regulatory context. These guidelines, however, have been criticized for implementing a tedious and lengthy process that may actually impede efficient adoption and use of alternative test methods. the need to validate the predictive nature of these methods has required the collection of extensive and expensive data sets for regulatory acceptance (Hartung, 2007). An alternative approach is to use in vitro and QSAR methods for prioritiza-

Summary
Developmental neurotoxicity testing (DNT) is perceived by many stakeholders to be an area in critical need of alternative methods to current animal testing protocols and guidelines. An immediate goal is to develop test methods that are capable of screening large numbers of chemicals. This document provides recommendations for developing alternative DNT approaches that will generate the type of data required for evaluating and comparing predictive capacity and efficiency across test methods and laboratories. These recommendations were originally drafted to stimulate and focus discussions of alternative testing methods and models for DNT at the TestSmart DNT II meeting (http://caat.jhsph.edu/programs/workshops/dnt2.html) and this document reflects critical feedback from all stakeholders that participated in this meeting. The intent of this document is to serve as a catalyst for engaging the research community in the development of DNT alternatives and it is expected that these recommendations will continue to evolve with the science.
Keywords: developmental neurotoxicity, screening, in vitro models screening. the primary focus is to provide recommendations and guidance to catalyze development of a suite of assays that can be used for prioritization. Prioritization would differentiate substances that are of high concern (and may need further evaluation for a particular toxicity pathway or endpoint) from substances that are of lower concern (by virtue of having lower potential for exerting biological actions on relevant physiological or pathological processes). these same test methods could be considered for use in screening drugs or chemicals prior to commercialization and to aid in grouping commercial chemicals for read-across, or replacement of in vivo animal testing for initial hazard screening.
The rest of this document provides definitions for important terms that describe test systems and a series of recommendations that can be used when developing new alternative test methods. Consideration of the recommendations made herein, while important for development of all test methods, should assist in the transition of methods from early development stages to use in screening and validation efforts. For each recommendation there are examples and references.

Definitions
For the purpose of this draft we defined the following terms: 1. Endpoint (E): the biological or chemical process, response, or effect assessed by a test method (OeCD Guidance Document 34, 2005) 2. Test system (TS): any animal, cellular, subcellular, chemical, or physical system, or a combination thereof, used in a study (modified from OECD, GLP principle, directive 87) 3. Test method (TM): A process or procedure used to obtain information on the biological effects of a substance or agent. toxicological test methods generate information regarding the ability of a substance or agent to produce a specified biological effect under specified conditions. Used interchangeably with "test" and "assay." (OeCD Guidance Document 34, 2005). the test method should assess one or more key aspects of human neurodevelopment. tion of chemicals for additional testing (Kavlock et al., 2009;Judson et al., 2010;Aschner et al., 2010;lein et al., 2007). This approach requires a paradigm shift in chemical hazard assessment, where initial testing will be based on efficient, high-throughput methods, and the results used to prioritize thousands of chemicals for additional testing. Positive "hits" in these high-throughput assays will be followed by additional targeted testing based, as much as possible, on known pathways of toxicity (Judson et al., 2010). However, negative chemicals cannot be viewed as being without concern; rather, less evidence to drive a concern would lower prioritization for additional testing (Aschner et al., 2010). Complete validation of alternative methods as replacements for current regulatory tests, per ICCVAM guidelines, should not be required for screening chemicals for testing prioritization. Developmental neurotoxicity testing is an area that is widely recognized as in need of alternative methods (Coecke et al., 2007;lein et al., 2005;lein et al., 2007;Bal-Price et al., 2010;Aschner et al., 2010). One issue that has restricted progress in development of new alternative methods for DNt is that funding is skewed heavily towards research on basic biological and toxicological mechanisms. this has led to the development of a wide variety of methods (Coecke et al., 2007) that are not necessarily amenable to testing large numbers of chemicals. Organotypic cultures, while providing a good model for 3-dimensional tissue organization (Sundstrom et al., 2005), also require animals for tissue harvesting and are low-throughput (Coecke et al., 2007).
the current document provides a set of principles, which, if embraced by the larger research community, will enhance the development of alternative test methods suitable for screening of large numbers of chemicals. It includes recommendations to facilitate development of alternative testing methods for screening substances for potential developmental neurotoxicity. these recommendations are not intended to be used for validation of test methods and should not be used to circumvent or substitute for any existing test method validation criteria (e.g., ICCVAM, 1997;Hartung et al., 2004;OeCD, 2005). test method validation for regulatory use involves a series of specific stages that commence with development of a method, and then proceed sequentially through test method optimization and standardization; protocol transferability assessment; studies in multiple labs to establish reliability, specificity and sensitivity; and finally peer review and regulatory review for acceptance into a regulatory framework. In contrast, this document primarily focuses on the early stages of this process: research and development, protocol optimization and protocol standardization. Taken together, these research efforts will develop sufficient data to demonstrate "proof of principle" that the test method performs adequately for the intended propose and to facilitate the comparison of data between laboratories. A favorable review of a test method's performance at the "proof of principle" stage paves the way for moving forward towards developing the data needed for regulatory acceptance.
the goal of this document is to engage the research community in the process of developing alternative methods for developmental neurotoxicity that are amenable to high-throughput 4. Parametric controls (e). Assay parameters that result in predictable changes in the endpoint should be characterized. These experimental parameters can be used to optimize the test method. Examples: i) Increasing nerve growth factor (for NS-1 cells) or retinoic acid (for SH-SY5Y cells) concentration will increase neurite outgrowth; ii) Increasing days in culture yields greater neurite growth; iii) Cell density influences neurite outgrowth. Reference:  5. Response characterization (E). The level of change in the response associated with an effect should be characterized. this is the degree of change that, if exceeded, results in a positive response (a "hit"). Importantly, one needs to have a fairly robust understanding of the variability in the control response levels in order to interpret results. Generally, there are two ways to determine the positive response level. the first approach, commonly used in pharmaceutical screening, defines a hit as any response greater than 3 SD from the control. this conservative statistical approach is used to ensure a very small number of false positives: false positives would be costly to pursue. In toxicological screening and prioritization for further testing, it may be acceptable to have a higher rate of false positives. Thus, a second approach defines a positive response level based on biological significance. Examples: i) NGF-induced neurite outgrowth in PC12 cells requires MAP kinase signaling, therefore the MAP kinase inhibitor U0126 could be used as a positive control; ii) NGF-induced neurite outgrowth in PC12 cells does not involve signaling via the JAK/StAt signaling pathway, thus a JAK/StAt inhibitor could be used as a negative control. Reference:  9. training set of chemicals (tS, tM, e). Once the method has been demonstrated to exhibit the correct characteristics described above (see sections 1-8), a "training set" of chemicals should be developed and tested. this training set should be composed of two types of chemicals: chemicals that are known to reliably elicit a response of concern (needed to assess sensitivity) and chemicals that are known to reliably elicit no response of concern (needed to assess specificity). evidence for an effect, or lack thereof, should come from in vitro data. However, additional evidence from in vivo studies, if available, is highly recommended. Selection of the training set of chemicals should also consider the purpose of the assay, as discussed above in the introduction. the goal of the training set is to evaluate the test method, including: 1) testing the practical ability of the method to efficiently process moderate numbers of chemicals; 2) confirming positive and negative controls; and 3) generating historical control data to characterize the inherent response range for the endpoint. References: Breier et al., 2008 10. testing set of chemicals (tS, tM, e). the testing set of chemicals should include a large number of substances judgment should be used to balance the biological and statistical relevance of the response level. References: Tierno et al., 2007;Breier et al., 2008 6. Concentration range (tM, e). each test method should be designed to characterize the concentration-response relationship. One recommendation is to minimally test five concentrations ranging from the solubility limit to five logs below the solubility limit. Concentration-response is critical to comparison of sensitivity between test methods, or endpoints within a test method. Concentrations above the level known to induce cytotoxicity should not be used.
7. endpoint selectivity (e). the ability of the test method to discriminate the endpoint of concern from other outcomes. Example: The ability to determine the concentration-relationships for both cytotoxicity and the endpoint in the same assay provides assurance that chemical-induced endpoint changes do not primarily result from cell death (Fig. 1).
References: Schmuck et al., 2000;Cristòfol et al., 2004 8. endpoint-selective controls (tM, e). endpoint-selective control chemicals reliably and consistently alter the endpoint by known mechanisms. Both positive and negative control chemicals should be tested. A positive control is a chemical or stressor which is known from previous experience to reliably affect the endpoint. A negative control is a chemical that reliably causes no effect on the endpoint of interest. A negative control demonstrates the base-line result obtained when a test chemical does not produce a measurable positive result. Methyl mercury decreased neurite outgrowth specifically, relative to cell viability. Cadmium chloride decreased neurite outgrowth in a nonspecific manner, i.e., only at concentrations that also induced cell death (redrawn from Parran et al., 2001;.

Cadmium Methyl mercury
known to affect endpoints of developmental neurotoxicity in vivo, as well as chemicals that reliably do not affect developmental neurotoxic endpoints. For in vitro screening assays (as opposed to replacement assays), it is important at this stage to demonstrate the ability of the method to rapidly and efficiently screen large numbers of chemicals with an adequate degree of sensitivity and specificity, and to provide data that can be used in determining future steps in the process of method development, validation, and regulatory acceptance of the endpoint (e) and test method.  Cooper et al., 1979;ICCVAM, 1997 12. High throughput (tS, e). In the case of in vitro screening assays, it is highly desirable for the test method to have the potential for automation. For any new, revised, or replacement assay, it is critical that the method be more efficient than the current testing scheme (OeCD test Guideline 426, 2007). testing one chemical with this guideline can take up to 6 months, require hundreds of animals, and cost hundreds of thousands of dollars. Automation of the method could lead to testing of hundreds or even thousands of chemicals for one endpoint in one day.
13. Documentation (tM). For any test method published with the intent to demonstrate feasibility for screening large numbers of chemicals, the test method needs to be fully documented and readily available to allow for implementation across laboratories. experimental details critical to replication of methods must be included. this kind of information Chemicals on this list were derived from published reports or regulatory data from humans, non-human primates, or laboratory mammals. Findings were deemed to be suggestive of adverse neurological outcomes following developmental exposure. To be included on the list there must be positive results from more than one laboratory (Mundy et al., 2009).