Assessment of a 3D Neural Spheroid Model to Detect Pharmaceutical-Induced Neurotoxicity

Drug-induced neurotoxicity is a leading cause of safety-related attrition for therapeutics in clinical trials, often driven by poor predictivity of preclinical in vitro and in vivo models of neurotoxicity. Over a dozen different iPSC-derived 3D spheroids have been described in recent years, but their ability to predict neurotoxicity in patients has not been evaluated nor compared with the predictive power of nonclinical species. To assess the predictive capabilities of human iPSC-derived neural spheroids (microBrains), we used 84 structurally diverse pharmaceuticals with robust clinical and preclinical datasets with varying degrees of seizurogenic and neurodegenerative liability. Drug-induced changes in neural viability and phenotypic calcium bursts were assessed using 7 endpoints based on calcium oscillation profiles and cellular ATP levels. These endpoints, normalized by therapeutic exposure, were used to build logistic regression models to establish endpoint cutoffs and evaluate probability for clinical neurotoxicity. The neurotoxicity score calculated from the logistic regression model could distinguish neurotoxic from non-neurotoxic clinical molecules with a specificity as high as 93.33% and a sensitivity of 53.49%, demonstrating a very low false positive rate for the prediction of seizures, convulsions, and neurodegeneration. In contrast, nonclinical species showed a higher sensitivity (75%) but much lower specificity (30.4%). The neural spheroids demonstrated higher likelihood ratio positive and inverse likelihood ratio negative values compared with nonclinical safety studies. This assay has the potential to be used as a predictive assay to detect neurotoxicity in early drug discovery, aiding in the early identification of compounds that eventually may fail due to neurotoxicity.

prediction of neurotoxicity for drug discovery. However, their predictive values to detect drug-induced toxicity, especially the approximate context of use, have not been assessed (Lancaster et al., 2013;Lancaster and Knoblich, 2014;Schwartz et al., 2015;Kilic et al., 2016;Qian et al., 2016Qian et al., , 2018Sirenko et al., 2019). Thus, we sought to evaluate the performance of a neural spheroid model in the detection of neurotoxicity to test whether it could be used in a predictive context in drug discovery. In this study, we investigated an integrated assessment of calcium oscillation endpoints as a surrogate for measuring neuronal electrophysiology, as well as cell viability to determine the neurotoxic potential of various compounds. We also developed a supervised learning algorithm using classifiers rooted from clinical therapeutic exposure and EC 50 (half maximal concentration) values determined in in vitro assays and produced an integrated neurotoxicity score (NT score). Overall, our work suggests that utilizing the NT score with a neural spheroid model can be readily incorporated into the pharmaceutical toxicological screening paradigms, allowing for early detection of neurotoxicity during the discovery and development of small molecule therapeutics.

Reagents and chemicals
In this study, the compound library consisted of 84 commercially available pharmaceuticals, hereafter denoted as "clinical compounds", and 33 compounds selected from Takeda's internal chemical library, hereafter denoted as "Takeda preclinical compounds." The clinical compounds were ordered from Sigma-Aldrich Company Ltd (St. Louis, MO, USA), R&D Systems (Minneapolis, MN, USA), or SelleckChem.com (Houston, TX, USA). The compounds were selected to reflect a diverse group of chemicals with various targets and multiple modes of action. The lists of clinical compounds and Takeda preclinical compounds are shown in Table S1 1 and Table 5, respectively.

Compound selection and neurotoxicity categorization
The 84 clinical compounds tested in this study were categorized into CNS-targeting or test sets (Tab. 1). Compounds placed into the test set have been in clinical use for years and have well-established mechanisms. Drugs placed in the CNS-targeting set are neurology drugs recently approved by the US FDA for marketing or being actively investigated in clinical studies. The mechanisms of their CNS toxicity are not well known; however, they likely are able to cross the blood-brain barrier as they are neurology medications. All 84 pharmaceuticals were also categorized into one of four categories based on clinical frequencies of CNS-related undesirable effects using information extracted from public online databases (i.e., SIDER 4.1: Side Effect Resource 2 , PharmaPendium 3 ) and data given in the product labels. Based on FDA and EMA labels, 17 com-but also frequently drugs in the cardiovascular and gastrointestinal therapeutic areas (Cook et al., 2014;Walker et al., 2018).
As highlighted by data on attrition and as evidenced by the complexity of neurotoxic liability, currently available experimental models for the detection, prediction, and clinical monitoring of neurotoxicities are still limited. During GLP toxicity studies, neurotoxicities are primarily identified as structural and functional disruptions detected in histopathological and rodent functional observation battery and modified Irwin (FOB/Irwin) behavioral observations. Timing is critical for histopathology because the time at which samples are collected may miss potential structural changes, as timing of neuronal cell death varies among compounds. Moreover, evidence shows that the neurotoxic profiles of a compound cannot be easily predicted by known profiles of similar compounds (Switzer, 2011). Overall, this complicates neurotoxicity detection by histopathology (FDA, 2001;Moscardo et al., 2007;Tontodonati et al., 2007). FOB/Irwin behavioral batteries are used to assess behavioral effects and potential neurotoxicity of compounds at specified timepoints, typically around the timepoint post-dose of the maximum drug plasma concentration (C max ) (Mathiasen and Moser, 2018). However, extensive analyses have shown that the FOB/Irwin assay has low concordance for predicting the occurrence of the five most commonly reported AEs during phase 1 clinical trials (i.e., headache, nausea, dizziness, fatigue/somnolence, and pain), as well as potentially life-threatening adverse drug reactions (ADRs) such as seizures in patients (Ross, 2000;Mead et al., 2016). Although convulsions, tremors, and twitching can be observed in the FOB/Irwin assay, these instances could be missed during unobserved periods. Moreover, sub-clinical seizure activity can occur without observable convulsion. In a retrospective analysis study, approximately 75% of clinical compounds with seizure liability did not cause convulsions in nonclinical safety studies (Nagayama, 2015).
Currently, in vitro assays for the detection of neurotoxicity are not widely used prior to performing animal studies in drug development. Rather, the traditional derisking approach for seizurogenesis involves a literature-based target safety review, in vitro pharmacological profiling, in vivo zebrafish or ex vivo brain slice electrophysiology, and FOB/Irwin assays combined with an electroencephalogram (EEG) (Easter et al., 2009). However, several shortcomings and technical challenges are associated with this strategy. For example, only samples from hippocampus and cortical regions are available for analysis using the brain slice assay, limiting detection of neurotoxicity in other regions (Accardi et al., 2018). EEG monitoring is rarely used in early drug discovery due to its low throughput, high cost, technical challenges, and relatively low sensitivity (Smith, 2005). The development of a high-throughput functional assay with better predictive value would play a critical role in pharmaceutical research and help address the gaps in evaluating drug safety.
Recently, advances in physiologically relevant in vitro models derived from human iPSCs offer promising tools to enhance the non-neurotoxic (negative), and those falling into neurotoxicity severity 2-4 are neurotoxic (positive).
The disease indications for each drug were collected from the Prescribers' Digital Reference 4 . Table S1 1 shows the clinical compounds, neurotoxicity severity, and primary molecular targets along with disease indications. Human total plasma C max (tpC max ) at therapeutic dose levels along with fraction unbound (Fu) were obtained from PharmaPendium 3 . Human free plasma C max (fpC max ) was derived by multiplying the tpC max with the corresponding Fu for those compounds. Table S2 1 shows therapeutic dose levels and their corresponding clinical C max including tpC max , Fu, and fpC max for all the selected clinical compounds. Physicochemical properties of the compounds such as lipophilicity (cLogP) and topological polar surface area (TSPA) at pH = 7.4 were obtained from DrugBank 5 . Table S5 1 shows the simplified molecular-input line-entry system (SMILES), cLogP, and TPSA data for all the clinical compounds.
To compare performance of nonclinical safety assessments in animal models with 3D human cortical spheroids for the detection of neural safety liabilities, the potential neural liabilities identified with animal toxicology data (mouse, rat, dog, and non-pounds in the CNS-targeting set and 13 compounds in the test set have no Black Box warnings for neurotoxicity. In addition, there were no reported CNS-associated severe AEs, so they were categorized as severity category 1 (neurotoxicity negative). Neurotoxicity severity was further classified within body system categories and ranked in order of increasing frequency using the following definitions: Compounds falling into category 2 (rare severity) were those causing CNS-related AEs in < 0.01% of patients (9 drugs in the CNS-targeting set and 2 drugs in the test set); category 3 (infrequent severity) are those drugs causing neurotoxicity occurring in 0.01% to 1% of patients (13 drugs in the CNS-targeting set and 1 drug in the test set); category 4 (frequent severity) are those drugs causing neurotoxicity occurring in > 1% of patients (8 drugs in the CNS-targeting set and 10 drugs in the test set). 11 drugs were anticonvulsants based on their disease indications. It was difficult to classify whether the reported seizures or convulsions were drug-induced, the onset of disease, or withdrawal effects. Therefore, we categorized these 11 drugs as severity category 5 (anticonvulsant). We further used a binary classification method to categorize the compounds. Compounds in severity category 1 are been matured for 10-12 weeks in vitro before use in screening (Sirenko et al., 2019). Upon arrival, plates were processed according to the manufacturer's instructions. Briefly, plates were centrifuged for 10 min at 200x g (Sorvall Legend XTR) and were inspected for leakage. The exteriors of the plates were decontaminated with 70% ethanol before unwrapping. After unsealing, each well was examined by light microscopy to ensure the presence of a single spheroid. Culture medium was replaced three times on the day of arrival using neural spheroid complete culture medium. Plates were incubated at 37°C (95% humidity, 5% CO 2 ) for a 7-day recovery period before the start of the assay. Culture medium was replaced every other day. The test compounds were dissolved in cell culture grade dimethyl sulfoxide (DMSO, Sigma-Aldrich Co. Ltd., Cat# D2650), unless otherwise specified. The compound stock solutions were made 1000x more concentrated than the top concentration tested unless limited by the solubility and stored in -20°C. The concentration range tested for each compound reached up to 100x and down to 0.1x of the tpC max observed in human or preclinical species for the clinical compounds and Takeda preclinical compounds, respectively. Half-log dilutions were performed to generate a seven-point concentration-response curve. Assays were run in triplicate. EC 50 values were calculated from the concentration-response curve using non-linear regression analysis in GraphPad Prism 8.2.1. Vehicle controls were included on each assay plate and were used for normalization of plate-specific phenotypic readouts. DMSO was less than 0.1% of the final volume.

Immunofluorescence and immunohistochemistry
For immunofluorescence experiments, neural spheroids were collected and rinsed once with PBS. Subsequently, spheroids were fixed with 4% paraformaldehyde in PBS for 20 min, permeabilized with 0.1% Triton X-100 for 20 min, blocked with 5% bovine serum albumin (BSA) for 30 min, and incubated with the primary antibody at 4°C overnight under gentle agitation. Anti-GFAP (abcam, Cambridge, MA, USA, Cat# ab7260) and anti-MAP2 (R&D Systems, Cat# MAB8304) antibodies were diluted in 1% BSA according to the vendor's recommendation. Following washing, the spheroids were stained with Alexa 488or Alexa 568-labeled secondary antibody (Thermo Fisher Scientific, Waltham, MA, USA Cat# A32723 and A10042, respectively) and ProLong™ Diamond Antifade Mountant with DAPI solution (Thermo Fisher Scientific, Cat# P36971). 3D videos were captured using a Nikon Eclipse Ti2 imager (Nikon Metrology Inc. Brighton MI, USA).
To examine the potential presence of necrotic cores, 6-10 spheroids were pooled and rinsed once with PBS. Spheroids were then fixed in 4% paraformaldehyde for 20 min and embedded in Richard-Allan Scientific™ HistoGel™ Specimen Processing Gel (Thermo Fisher Scientific Cat# HG-4000). Histo-Gel-embedded spheroids were rinsed in serial ethanol dilutions, and further embedded in paraffin blocks. For analysis, 5-μm sections were prepared and underwent routine hematoxylin and eosin (H&E) staining following standard protocols as previously described (Mongan et al., 2008). Images were reviewed by a certificated pathologist. human primates (NHP)) for the clinical compounds were collected from FDA or EMA approval documents compiled in Phar-maPendium 3 . Data are summarized in Table S7 1 . There were 16 compounds with no nonclinical safety data available in Phar-maPendium. Therefore, we denoted the nonclinical safety data for those compounds as "not in PharmaPendium". Twenty-one compounds demonstrated no seizure or neurodegenerative liabilities in preclinical studies, and we denoted these compounds as having "no preclinical findings". The neural liabilities (seizure, epilepsy, convulsion, or neurodegenerative disorders), as well as the species in which the risks were identified, were provided for all other compounds. If the compound caused neural liabilities in at least one nonclinical species, the compound was considered neural toxic. We adopted a multi-parameter approach developed by Monticello et al. to evaluate the potential impact of animal testing in predicting clinical outcomes, since there is no single statistical measure that is optimal for comparing the concordance of animal data to humans (Olson et al., 2000;Monticello et al., 2017). We calculated sensitivity, specificity, likelihood ratio positive (LR+), and inverse likelihood ratio negative (iLR-) for the assessment of nonclinical to clinical concordance rate.
Thirty-three Takeda preclinical compounds were selected from Takeda's internal chemical library and defined as neurotoxic or non-neurotoxic based on the results of single or repeat-dose toxicity studies (up to 4 weeks) in rats, dogs or NHP. Twenty-three compounds were classified as neurotoxic since convulsions were observed in at least one preclinical species, and 10 compounds were classified as non-neurotoxic because no convulsions or any pathological changes in CNS were observed in any of the preclinical species tested. Table 5 summarizes the neurotoxicity classifications as well as preclinical species in which neurotoxicity findings were identified in the Takeda preclinical compounds. 15 unique molecular targets were covered by these internal compounds, of which 6 were not CNS-targeted.

Cell culture and compound treatment
Pre-plated 3D human cortical spheroids were obtained from Ste-moniX, Inc. (Maple Grove, MN, USA) in the form of the Stem-oniX "microBrain" Neural Spheroid 3D Assay Ready 384-Plate (Cat# BSARX-AA-0384). The cells were derived from a healthy male donor over the age of 18 with a normal genetic background, and all experiments utilized the same donor source of cells. The neural spheroid plates were shipped overnight at room temperature as ready-to-use format. The manufacturing batch numbers for the neural spheroid lots used in the screening were 20171221-01, 20180122-01, and 20180503-01. The neural spheroids were cultured in serum-free BrainPhys™ Neuronal Medium with SM1 Kit (StemCell Technologies, Vancouver, BC, Canada; Cat# 05792) containing 1× penicillin-streptomycin and supplemented with recombinant human brain-derived neurotrophic factor (BDNF) and recombinant human glial cell line-derived neurotrophic factor (GDNF) (StemCell Technologies, Vancouver, BC, Canada; BDNF: Cat# 78005; GDNF: Cat# 78058), each at a final concentration of 20 ng/mL. Each well of the plate contained a single, free-floating human iPSC-derived cortical 3D culture generated from neural progenitor cells (NPC). The cultures had Dye interacts with intracellular Ca 2+ , and the fluorescence changes as a result of any increase or decrease in Ca 2+ concentrations. The kinetics of intracellular Ca 2+ oscillations were determined by measuring fluorescence intensity at 515 to 575 nm following excitation at 470 to 495 nm for 10 min at a frequency of 3 Hz using the FLIPR Tetra High-Throughput Cellular Screening System (Molecular Devices LLC). The exposure time per read was 0.4 s, the gain was set to 2,000, and the excitation intensity was set to 30%. The instrument temperature was kept constant at 37°C. For acute exposure to pharmaceuticals, cells were pre-loaded with Calcium 6 Dye for 2 hours prior to compound treatment. Baseline for calcium oscillations without treatment was measured prior to the addition of compounds. The effects on calcium oscillations were measured 30 min after exposure. The 30-min timepoint was selected based on preliminary evaluation of multiple timepoints from 30 min to 6 h post compound administration (data not shown). Quantitative data signatures including peak count (per 10 min), average peak amplitude, average peak spacing (time between peaks), average peak width at 10% amplitude, average peak rise time (from 10% to 90% amplitude), and average peak decay time (from 90% to 10% amplitude) were automatically derived using the ScreenWorks Peak Pro (4.2) software package (Molecular Devices LLC). Data were normalized to vehicle control (0.1% DMSO) wells. Concentration-response curves and EC 50 estimates for each single data signature were generated by non-linear regression of log-transformed inhibitor concentrations (8-point serial dilutions including vehicle) vs. normalized response with variable Hill slopes and the top and bottom constrained to a constant value of 100 and 0, respectively (Prism™ GraphPad, GraphPad Software, La Jolla, CA, USA). The EC 50 values of each calcium data signature for the clinical compounds and Takeda preclinical compounds are summarized in Table S2 1 and Table 5, respectively.
For the initial characterization of the effects of the clinical compounds on the neural spheroid Ca 2+ burst profile, the normalized peak count and peak amplitude were hierarchically clustered based on their changes relative to vehicle control as a function of the compound concentration. Clustering was performed using custom-written R programs (R code team, version 3.3.2).
Cytotoxicity assessment in 3D human cortical spheroids 10-12-week-old neural spheroids were treated with compounds for 7 days with half-medium change performed every other day. Viability based on ATP levels was determined at the end of the 7-day treatment using the CellTiter-Glo ® 3D cell viability assay kit according to the manufacturer's instructions (Promega, Madison, WI, USA, Cat# G9682). The 7-day timepoint was selected based on preliminary evaluation of multiple timepoints from 24 h to 144+ h post compound administration (Fig. S4 1 ). Luminescence was read on a Cytation™ 5 multi-mode reader (BioTek, Winooski, VT, USA). Data from compound-treated neural spheroids were normalized to the respective vehicle controls (0.1% DMSO). The EC 50 values were calculated in GraphPad Prism as described above. The EC 50 values of cell viability for the clinical compounds and Takeda preclinical compounds are summarized in Table S2 1 and Table 5, respectively.

Total RNA extraction and bulk mRNA sequencing as well as data analysis
Total RNA of human total brain, human brain cerebral cortex, and human brain frontal lobe were purchased from Takara-bio (Mountain View, CA, USA, Cat# 636530, 636561, 636165, respectively). iPSC-derived iCells including astrocytes, GABA-Neurons, GlutaNeurons, microglia, and cardiomyocytes were purchased from FUJI FILM Cellular Dynamics (Madison, WI, USA, Cat# R1092, R1013, R1016, R1131 and R1017, respectively). 14 day-old primary human hepatocyte-derived liver microtissues were purchased from Insphero (Brunswick, ME, USA, Cat# MT-02-002-01). Human primary ventricular endothelia and cardiac ventricular fibroblasts were from PromoCell (Cat# C-14029 and C-14036, respectively). Total RNA was extracted with miRNA Mini Kit (Qiagen, Valencia, CA, USA, Cat# 217004) according to the manufacturer's specifications. All steps of library construction, cluster generation, and HiSeq (Illumina, San Diego, CA, USA) sequencing were performed with triplicate samples by Q 2 Solutions (Morrisville, NC, USA). RNA sequencing libraries were prepared with the TruSeq RNA Kit according to the manufacturer's specifications (Illumina, Cat# RS-122-2001/2002. Input RNA had an integrity number greater or equal to 7. To quantify the library concentration for clustering, the library was diluted 1:100,000 in a buffer containing 10 mM Tris-HCl, pH 8.0, and 0.05% Tween20, and analyzed by quantitative PCR (qPCR) with a KAPA Library Quantification Kit (Kapa Biosystems, Woburn, MA, USA, Cat# KK4824) using a QuantStudio7 Flex real-time PCR machine (Applied Biosystems, Waltham, MA, USA). Equal amounts of six individually indexed cDNA libraries were pooled for clustering in an Illumina cBot system flow cell at a concentration of 8 pM using Illumina's TruSeq SR Cluster Kit V3 and sequenced for 50 cycles using a TruSeq SBS kit on the Illumina HiSeq system. Each sample generated approximately 30 million sequencing reads.
Sequencing reads were demultiplexed and exported to fastq files using CASAVA 1.8 software (Illumina). Data analysis was performed using OmicSoft ArraySuite software, version 10.2.3.10 (QIAGEN ® , Cary, NC) and R, version 3.6.1. Briefly, the reads were aligned to the reference genome (hg19) using the OSA4 algorithm in OmicSoft. Expressed genes were filtered using default parameters in filterByExpr (Chen et al., 2016) and counts normalized using the trimmed mean of M (TMM), both methods in edgeR (version 3.26.8) (Robinson and Oshlack, 2010). The TMM for the expressed genes were further used for principal component analysis (PCA) using the PCAtools library (version 2.6.0; removing lower 1% of genes by variance) to check the overall variance of each sample.

Calcium oscillation assay
The calcium oscillation assay was conducted by StemoniX with compound identity blinded. The intracellular Ca 2+ oscillations in 10-12-week-old 3D cortical spheroids were assessed using the FLIPR ® Calcium 6 Evaluation Kit (Molecular Devices LLC, San Jose, CA, USA, Cat# R8190) as previously described (Sirenko et al., 2013;Grimm et al., 2015). Calcium 6 Statistical evaluation consisted of ROC analysis to determine a classification boundary between the two classes. The criterion for defining the discrimination threshold minimized the distance in the ROC curve from the perfect assay (sensitivity 100% and specificity 100%). Statistical analysis was performed using R version 3.3.2 (R code team). To compare the utility of each data signature to identify CNS toxicity, we calculated the likelihood ratio (LR). LR represents the ratio of the probability of a positive assay result for compounds associated with CNS toxicity to the probability of a positive assay result for compounds that do not cause CNS toxicity. The LR summarizes sensitivity and specificity to characterize the utility of a predictor for increasing certainty about a diagnosis and is less dependent on clinical frequency (Tab. S4 1 ).

Characterization of 3D neural spheroids on morphology and global gene expression
In the present study, we characterized 3D neural cultures by performing immunostaining on the spheroids for MAP2 (microtubule-associated protein 2, a neurogenesis marker) and GFAP (glial fibrillary acidic protein, an intermediate filament protein expressed in astrocytes of the CNS) from a 3D perspective (Fig. S1 6 ). Figure 1A shows an enhanced global neural network in the 3D neural cultures. The neural spheroid contains both astrocytes and neurons as previously reported (Sirenko et al., 2019).
Spheroids greater than 200 μm may have proliferating cells in their outer layer and leave a necrotic core at the center due to the gradient of both nutrient and oxygen levels (Kim, 2005;Chaicharoenaudomrung et al., 2019). To address this concern, we evaluated the centers of the spheroids and examined if a necrotic core was present by H&E staining on 5-µm paraffin sections throughout the entire spheroids. Representative images Figure 1B,C depict the neural spheroids, which were characterized by round to oval structures 400-500 µm in diameter and composed of fiber tracts that resembled the myelinated axonal meshwork that forms the white matter. The fiber tracts contained a moderate number of intact cells that were mostly composed of glial cells and fewer scattered neuronal bodies. Cellular density was higher in the core of the structure. Occasional cross sections of intact axons were also noted. There was no evidence of degenerative or necrotic changes. Distinct blood vessels were not present ( Fig. 1B,C).
Previous analysis demonstrated concordance between neuronal gene expression in spheroids and adult human brain samples (Sirenko et al., 2019). To further evaluate the physiological relevance of neural spheroids, we examined the global gene expression profile of 4-to 10-week-old spheroids by RNA-seq and compared neural spheroid global gene expression with 14 other different human cell types and primary tissues. We performed PCA to identify the major sources of variation in our RNA-seq dataset. As shown in Figure 2A, the first and second components explained 71.82% of total variation, with the first principal com-

Statistical methods for data analysis: receiver operating characteristic (ROC) analysis and logistic regression
The objective of the statistical analysis was to analyze the performance of neural spheroids in the detection of neurotoxicity. The ratio of the fpC max (nM) to the in vitro EC 50 (µM) is termed the "margin of exposure" (MOE) and calculated for each compound (shown in Tab. S3 1 ). In some cases, the EC 50 is greater than the highest concentration tested, in which case the MOE is too small to determine and is thus set to 0. Binary classification as either neurotoxicity negative 0 (NT negative, neurotoxicity category 1) or neurotoxicity positive 1 (NT positive, neurotoxicity categories 2 to 4) were used as dependent variables. Therefore, with the 7 predictors and a binary response variable, we had an 84×7 data matrix (summarized in Tab. S3 1 ).
Because of the binary nature of neurotoxicity for each compound, we built logistic regression models using the MOE as independent variables and clinical neurotoxic outcome as dependent variables. To test the predictivity of a single predictor, 7 univariate logistic regression models for each predictor were fitted using custom-written R programs (R code team version 3.3.2). The univariate logistic regression model was defined as: The probability predicted from each single independent variable was: To test the predictivity of combined independent variables, a multivariate logistic regression model was built with all 7 independent variables: π Logit(π) = log ( 1−π ) = β 0 +β 1 *x 1 +β 2 *x 2 +β 3 *x 3 +β 4 *x 4 +β 5 *x 5 +β 6 *x 6 +β 7 *x 7 Eq.3 Hence, the probability of neurotoxicity (also defined as NT score) was In the regression models, the x's are independent variables and denote the predictors fpCmax , β's are the associated slope parameters, and i is the number of the data signature extrapolated from the two assays, including peak count (when i = 1), peak amplitude (when i = 2), peak rise time (when i = 3), peak decay time (when i = 4), peak spacing (when i = 5), peak width (when i = 6), and ATP (when i = 7). The assumptions for the model are that the binary responses (non-toxic or toxic) are Bernoulli random variables with probability determined by the regressors. EC50i ponent explaining up to 57.05% variance. This is consistent with previous studies in that the majority of variation is explained by the first few principal components (Lukk et al., 2010;Lenz et al., 2016). Samples of hepatic cells separated from the other tissue/cell types, indicating a transcriptomic dissimilarity from others. Neural spheroids clustered with cells originated from the brain including human total brain, human brain frontal lobe, human brain cerebral cortex, iPSC-derived astrocytes, GABAergic neurons, glutamatergic neurons, and microglia. The PCA results demonstrated that the transcriptome of the neural spheroid closely resembles the brain rather than the liver. Interestingly, the iPSC-derived cardiomyocytes, primary ventricular endothelia, and ventricular fibroblasts clustered closer to primary brain tissues and iPSC-derived neurons. We removed samples with liver origins and performed PCA a second time with brain and heart samples. Figure 2B shows that the first component explained 48.96% of the total variation. The expression profile of neural spheroids was closer to primary brain samples and iPSC-derived iCell neurons and astrocytes rather than samples with heart origin. Interestingly, we observed a time-dependent alignment of the neural spheroids with 10-week-old spheroids closer to the sample types from primary human brain, heart, and liver, cell lines, iPSC-derived microtissues and cells, and (B) 14 different sample types from (A) without samples of liver origin. For liver microtissues and neural spheroids, 4-6 microtissues/spheroids were pooled together as a single sample. Technical replicates were prepared for each sample to generate RNA-seq data. cellular calcium levels using calcium indicators based on fluorescence. The calcium oscillation traces at two concentrations (approximately C max and 10-fold of C max ) after 30 min exposure are shown in Figure 4. We identified a diverse range of compound effects on 3D neural spheroids. Table 2 presents the EC 50 values calculated from concentration-response curves for the six defined parameters and cell viability ( Fig. 4; Fig. S2(A-J) 6 ).
We tested if some neurodegenerative antineoplastic agents might affect the calcium oscillation pattern of 3D neural spheroids after 30 min exposure. Doxorubicin did not induce apparent alterations in Ca 2+ tracings at the C max level (200 nM) but significantly reduced the peak amplitude at 20 µM ( Fig. 4; Fig. S2(A) 6 , respectively). Patients treated with platinum drug-based therapies frequently experience neurotoxic symptoms like peripheral neuropathy (Kanat et al., 2017). Interestingly, carboplatin (clinical C max = 10 µM) significantly slowed down the Ca 2+ burst at 5 µM and completely blocked calcium oscillations at 15 µM. From the concentration-response curve shown in Figure S2(B) 6 , carboplatin treatment induced a robust decrease in peak count and an increase in peak amplitude, peak rise time, and peak width in a concentration dependent manner. Exposure to MPTP in animals and humans causes permanent symptoms of Parkinson's disease due to the active metabolite MPP+ insult on dopaminergic neurons in the substantia nigra of the brain (Przedborski et al., 2001). Interestingly, although there was no significant effect on cell viability ( Fig. S2(J) 6 ), exposure to MPTP resulted in an approximately 40% increase in peak count at 1.5 µM. But at 15 µM, MPTP significantly reduced peak count and induced an increase in peak spacing and peak irregularity. MPTP also induced a concentration-dependent decrease in peak amplitude. The response of 3D neural spheroids to MPTP suggests astrocytes within the neural spheroids are metabolically active and functionally responsive to the neurodegenerative compound MPTP.
Network activity was acutely altered by adding neurotransmitters known to modulate hippocampal activities (Gulyas et al., 1999;Drever et al., 2011). To examine if neural spheroids are responsive to neurotransmitters, we treated the neural spheroids with AMPA, an AMPA receptor agonist, or acetyl-primary brain samples than those collected at earlier timepoints. Since neural spheroids are composed of a mixture of neurons and astrocytes, we do not expect them to display an identical expression profile with iCell neurons or astrocytes.

Statistical analysis on margin of exposure employing ROC assessments
Dose and drug exposure, as measured by the tpC max levels, has been shown to be associated with drug-induced liver injury (DILI) (Chen et al., 2013;Shah et al., 2015;Proctor et al., 2017). Accordingly, we tested the predictive value of C max levels of the clinical compounds in the absence of any assay assessment. Sensitivity and specificity were optimized across either the tpC max or fpC max for the binary neurotoxicity classified compound set using ROC analysis. ROC curve analysis of the tpC max (Fig. 3A) demonstrated overall sensitivity and specificity of 58.14% and 63.33%, respectively, at an optimized threshold of 0.6415 µM, with the highest area under the ROC curve (AUC) value (56.51%). fpC max demonstrated a slightly better predictivity, with overall sensitivity and specificity of 69.77% and 60%, respectively, at an optimized threshold of 52.78 nM with the highest AUC value 60.54% (Fig. 3B).

Calcium oscillation measurements in 3D neural spheroids exposed to pharmaceuticals
Previous data from Sirenko et al. (2019) showed that neural spheroids are responsive to agonists and antagonists targeting a panel of receptors including kainite, NMDA, GABA, dopamine, and Na + receptors, confirming that the expressed receptors are functional in neural spheroids. In the present study, we aimed to evaluate the response of 3D neural spheroids to neuroactive drugs. We tested a set of 9 well-characterized neuroactive compounds whose targets have been highly studied in the neuroscience therapeutic area. Among these 9 compounds, MPTP, FPL64176, acetylcholine, and AMPA are tool compounds or neurotransmitters and do not have clinical exposure data. The compounds (Tab. 2) cover 8 common mechanisms of neurotoxicity. We evaluated calcium oscillation by monitoring changes in intra-

Fig. 3: Optimized receiver operating curve (ROC) for tpCmax (A) and fpCmax alone as predictor of clinical neurotoxicity
Optimized ROC curve for 84 commercial compounds with and without associated clinical neurotoxicity. The ROC curve was generated from tpCmax (A) and fpCmax (B) data for the clinical compound set, and an optimized threshold was determined.

Fig. 4: Representative spontaneous calcium oscillation signal traces shown for selected pharmaceuticals of neuronal activity
Graphs display phenotypic responses of neuronal activity after exposure to selected drugs reflecting a diverse range of drug effects on 3D neural spheroids. Traces are shown 30 min after exposure to select drugs at designated concentrations that vary 3or 10-fold: 1x Cmax (green); 10x Cmax (red). Unaffected regular Ca 2+ oscillation patterns are represented by vehicle control (DMSO, 0.1%). Calcium oscillation assay was conducted in triplicate on three individual neural spheroids for each compound.
The concentration-response curves for cell viability are shown in Fig. S2(J) 6 . MPTP, acetylcholine, and acetaminophen had no effect on cellular ATP levels up to the highest concentration tested. Interestingly, we observed a 50% increase in cellular ATP levels in spheroids treated with 100 µM gabapentin, which is not surprising given the 7 days of incubation with gabapentin prior to ATP measurement and that blockade of the calcium alpha 2-delta subunit has been shown to decrease synaptogenesis, which could explain conservation of energy and the apparent increase in ATP (Dolphin, 2018). AMPA, carboplatin, and doxorubicin reduced cellular ATP levels in spheroids in a concentration-dependent manner with a maximum at 50-60% relative to DMSO controls, in agreement with their neurodegenerative effects reported earlier.

Assay variability
We evaluated variability among endpoints related to calcium oscillation by calculating the inter-and intra-assay variability of neural spheroid plates. The coefficient of variation (% CV) for control samples across 3 separate experiments was determined for each of the six measurements. We pooled the vehicle control wells from 8 microplates to calculate the overall % CV (Fig. S3) 6 . As shown in the figure, the % CV values were ≤ 20% for most parameters, consistent with results shown in Sirenko et al. (2019). Peak rise time and peak decay time exhibited higher variabilities compared with other endpoints, 41.61% and 47.78%, respectively, in our study, higher than the values reported by Sirenko et al.,i.e.,35.6% and 13.7%, respectively.

Unsupervised learning to characterize commercial compounds on calcium burst
We applied an unsupervised hierarchical clustering method to examine the pattern of pharmaceutical-induced effects on neu-choline, an agonist for muscarinic-R (M1). 10 μM acetylcholine or 500 nM AMPA caused partial desynchronization of the calcium activity, consistent with the effects of neurotransmitters in 2D primary hippocampus neuronal culture (Verstraelen et al., 2014). The desynchronization quantitatively demonstrated as a 40-50% decrease in peak count and peak amplitude, as well as an increase in peak spacing ( Fig. 4; Fig. S2(F) 6 ). 5 µM AMPA completely blocked calcium activities in spheroids ( Fig.  4; Fig. S2(E) 6 ).
FPL64176, a calcium channel blocker, caused a concentration-dependent decrease in peak count and peak amplitude, and an increase in peak spacing and irregularity ( Fig. S2(H) 6 ). At 10 µM concentration, calcium burst was completely blocked (Fig. 4). However, when we tested gabapentin, which specifically targets the alpha 2-delta subunit of voltage-gated calcium channels (gene name CACNA2D1), we did not observe a significant alternation in calcium burst up to 100 µM (Fig. S2(D) 6 ;  Fig. 4), even though expression of CACNA2D1 was confirmed in the neural spheroids in our RNA-seq data (data not shown). Our results showing lack of acute response (within 30 min of treatment) within the neural spheroids appear to be in line with literature describing that a longer duration of incubation with gabapentin (of at least 17 hours) is required in vitro to produce a reduction of calcium currents (Heblich et al., 2008;Dolphin, 2018). 4-Aminopyridine, a potent convulsant that is a relatively selective blocker of voltage-sensitive potassium channels induced a concentration-dependent increase in peak count and a decrease in peak spacing, peak rise, and decay time (Fig. S2(G) 6 ), indicating an increase in neural activity when the potassium channel is blocked. Acetaminophen, which serves as a negative control in this assay, did not significantly alter calcium oscillation at both 100 μM and 1 mM (Fig. S2(I) 6 ; Fig. 4).  trations. Interestingly, the 30 drugs with low CNS risk were evenly distributed among the three groups. Hierarchical clustering on peak amplitude also resulted in three groups (Fig. 5B). Cluster 1B consisted of compounds that did not significantly reduce peak amplitude at all concentrations. Cluster 2B included compounds that completely inhibited peak amplitude at the highest concentration. Cluster 3B represented compounds that completely inhibited amplitude from mid to top concentrations in a concentration-dependent manner. Of the 30 drugs with low neurotoxicity risk, 14 compounds were in 1B, 9 in 2B, and 7 in 3B. Like in Figure 5A, the 30 drugs with low neurotoxic risk were distributed among all three clusters, indicating that a single calcium burst data signature is not sufficient to predict neurotoxicity in the clinic. Table S6 1 summarizes where each compound localized in each cluster of peak count and peak amplitude. Scrutiny of the compo-ral spheroid calcium burst for the clinical compounds. The effect of pharmaceuticals on peak count (Fig. 5A) and peak amplitude (Fig. 5B) were hierarchically clustered using Euclidian distance based on concentration-dependent response. As shown in Figure 5A, all the compounds were grouped into three main clusters with approximately a third of compounds falling into each cluster. Compounds in cluster 1A did not induce apparent concentration-dependent effects on peak count. Compounds in cluster 2A partially reduced peak count at the top concentration. Compounds in cluster 3A completely blocked calcium burst from mid to top concentration in a concentration-dependent manner. Aminopyridine and pimavanserin do not fall into these three clusters. Aminopyridine induced a concentration-dependent increase of peak count, and pimavanserin induced an increase in peak count at mid-concentration followed by a sharp drop at higher concen- sition of each cluster (i.e., cluster 1A vs 1B, 2A vs 2B, and 3A vs 3B) showed that clusters with the same pattern of effects on peak amplitude and peak count did not contain the same compounds, indicating that the two endpoints are independent.

Unsupervised and supervised learning to model seizurogenic and neurodegenerative liabilities
Our findings on the concentration-response effects on peak count and peak amplitude demonstrated the need for a statistical modeling strategy to better understand the 3D neural culture system and improve its utility in drug discovery. We aimed to develop a fitted model to describe the relationship between a discrete outcome variable (non-neurotoxic or neurotoxic) and a set or subset of independent variables extrapolated from the calcium oscillation profiles and cellular cytotoxicity assays subsequently referred to as endpoints. To root clinical exposure data into neural spheroid exposures and analyses, we calculated the "margin of exposure" (MOE) defined as fpC max /EC 50 (Tab. S3 1 ). To examine the overall trend of MOE across all the compounds, we conducted unsupervised hierarchical analysis by evaluating Euclidian distance (Fig. 6). Hierarchical clustering grouped the compounds into 3 large clusters. Cluster A consisted of compounds that induced changes in 5-6 endpoints. Compounds in Cluster B induced changes in 1-3 endpoints, mostly peak count and/or peak amplitude. Cluster C contained compounds that did not induce any changes in the 7 endpoints. There were 9 compounds with low neurotoxicity that fell into Cluster C, i.e., diclofenac, acetaminophen, pilocarpine, ketorolac, reserpine, levodopa, duloxetine, sertindole and L-DOPS. The other 13 compounds with clinical neurotox-

Fig. 7: Receiving operating curve (ROC) analysis for the clinical compounds in the CNS-targeting set
The area under the curve (AUC) value of 72.9% summarizes the entire location of the ROC rather than a specific operating point and represents the combined measure of sensitivity and specificity describing the inherent validity for neurotoxicity determination. ROC analysis of the predicted CNS toxicity using a logistic regression analysis threshold value of 0.6415 demonstrated overall sensitivity and specificity of 50% and 94.12%, respectively.

Fig. 8: Summary of the predicted CNS risk for clinical compounds analyzed in the neural spheroid 3D model
The predicted CNS risk for the combined CNS-targeting and test sets using a cutoff threshold value of 0.6415 for the logistic regression analysis. Category 5 (anticonvulsant) medications were excluded from this analysis. nels that might not be expressed in the spheroids. In comparison, other sodium channel compounds like riluzole, ropivacaine, and lamotrigine induced robust changes of calcium signals, suggesting that only a subgroup of Na + channels are functional in neural spheroids. Other compounds in cluster C were compounds targeting enzymes like reductase, fatty acid amide hydrolase (FAAH), phosphodiesterase and acetylcholinesterase.
The binary nature of the dependent variable indicated that lo-icity in this cluster were not active in neural spheroids. Interestingly, seizurogenic antibiotics targeting bacterial bioprocessing like isoniazid, cefazolin, and enoxacin did not alter calcium activities in neural spheroids, suggesting that the off-targets that drove the seizure risk in humans might not be expressed in neural spheroids. Divalproe, topiramate, and lidocaine, which target voltage-gated Na + channels, were not active in neural spheroids, suggesting they may only target a subgroup of Na + chan- 17 non-toxic compounds was misclassified. It is interesting that aspirin demonstrated a very strong alteration in calcium oscillation and the predicted probability from logistic regression was 1. The model showed 61.54% sensitivity at four threshold levels when used on the test set, relatively higher compared to the sensitivity in the CNS-targeting set.
Combining the two compound sets together, the overall performance for the predictability at all 4 threshold values is summarized in Table 3C. The range in sensitivity and specificity was 33.3% to 56.6% and 76.66% to 96.67%, respectively across all 4 thresholds. Using a probability threshold set at 0.6415, the predicted CNS risk for all compounds is shown in Figure 8. 93.33% (28/30) of non-neurotoxic compounds were predicted as negative, with aspirin and galantamine false positive. 23 out of 43 neurotoxic compounds were predicted as positive, demonstrating a true positive rate of 53.49%.

Concordance parameter comparison between the neural spheroid model with nonclinical safety data
To compare the performance of neural spheroids and nonclinical safety studies in the detection of human neural liabilities in the clinic, we conducted concordance analysis by assessing a few concordant parameters (shown in Tab. 3D). Nonclinical safety studies demonstrated higher sensitivity than neural spheroids. However, nonclinical safety studies showed significantly less specificity compared with neural spheroids (30.4% vs 93.3%), suggesting a high false positive rate and a tendency to falsely identify neurotoxic liability in compounds that have been found to be well tolerated clinically. Further, neural spheroids showed higher LR+ and iLR− than nonclinical species. Especially, the LR+ of the spheroids was 4.7 times higher than the value for nonclinical species. The concordance analysis suggests that neural spheroids have higher odds of predicting a positive clinical finding than nonclinical safety studies.

Evaluation of the neural spheroid model using Takeda preclinical compounds
After demonstrating the utility of the neural spheroid model in screening neuroactive pharmaceuticals for potential neurotoxicity with high specificity, we tested 33 Takeda preclinical com-gistic regression would be a feasible approach. First, we conducted a univariate analysis for single predictors of neurotoxicity using the CNS-targeting set (Tab. S4 1 ). The univariate analysis showed that all 7 predictors yielded an AUC > 50%, suggesting that each predictor can distinguish between a non-neurotoxic and neurotoxic compound effect and that the pharmaceutical-induced alteration for each individual predictor was not random. Peak rise time and peak spacing showed the largest AUCs of 63.92% and 63.73%, respectively. Peak rise time showed the highest LR+ of 6.8, followed by peak amplitude with an LR+ of 4.533. Peak decay time demonstrated the smallest AUC and LR+ compared with other predictors. Sensitivity ranged from 20% to 80%, and specificity ranged from 23.35% to 94.12%. Each predictor demonstrated a higher specificity than sensitivity, except for peak decay time, which had 80% sensitivity and 23.35% specificity. The high specificity, except for the peak decay time, indicates a low false positive rate for the individual predictors.
Since each of the predictors was useful, we continued to test whether the predictivity improved if we applied all the independent variables by fitting a multivariate logistic regression model using the CNS-targeting set matrix data. The coefficient for each independent variable is shown in Table 4. The fitted values (also defined as NT score) for each of the compounds in the CNS-targeting set are summarized in Table S3 1 . Assay sensitivity and specificity were optimized across the predicted probability of CNS toxicity calculated from logistic regression for the binary clinical compound CNS-targeting set using ROC analysis. If we set the cutoff threshold value at 0.6415, ROC analysis of the predicted CNS toxicity demonstrated overall sensitivity and specificity of 50% and 94.12%, respectively (Fig. 7). The high specificity indicates a very low false positive rate. The multivariate analysis showed that the neural spheroid model experienced increased sensitivity to identify clinical neurotoxicity at the lower cutoff value. The summary of predictivity for the CNS-targeting set across 4 threshold values is presented in Table 3A.
To examine the performance of the logistic regression model fitted from the CNS-targeting set, we used the logistic regression to calculate the NT score for the compounds of the test set using the same threshold (Tab. 3B). At a threshold greater than 0.5944, the specificity was still high (up to 92%). Only 1 (aspirin) out of ies showed high sensitivity; however, they were limited by the relatively low number of compounds tested in the assays. In most cases, the authors did not select balanced numbers of true negative control compounds to appropriately determine specificity. In addition, most of these studies did not compare their in vitro results with those from preclinical in vivo studies. Therefore, there is a need for statistically powered study designs with large numbers of well annotated pharmaceuticals to assess the predictive power of in vitro models for neural liability risk assessment. Our goal was to evaluate iPSC-derived 3D neural spheroids as an in vitro platform for investigating neurotoxicity of compounds using functional phenotypical screening. We used 10-to 12-weekold spheroids as they are the closest model recapitulating not only human physiological ion channel and neurotransmitter receptor genes (Sirenko et al., 2019) but also global gene expression as compared to native human brain samples (Fig. 2). iPSC-derived 3D models offer the opportunity to better mimic in vivo physiological characteristics and help identify toxicant-induced changes (Luo et al., 2016Renner et al., 2017;Leite et al., 2019). In this study, we monitored spontaneous electrical activity of the spheroids via live cell calcium imaging and evaluated the effects of pharmaceuticals on spontaneous calcium activities. This technique allowed the detection of subtle changes in neuronal physiology with a higher sensitivity compared to monitoring neural viability. Highly synchronized oscillatory activity has been described in different brain regions in rodents and humans, as well as in in vitro cultured brain slices and even in dissociated primary neuronal cultures (Silva et al., 1991;Ben-Ari, 2001;Khazipov and Luhmann, 2006;Cohen et al., 2008;Verstraelen et al., 2014;Kuijlaars et al., 2016). Verstraelen et al. (2014) reported a random calcium dynamic in early neuron and astrocyte monolayer cultures isolated from mouse hippocampi that became synchronized over time. Interestingly, it seems the synchronization only occurred within cells of the same type, as the calcium waves of neurons had faster kinetics than those of non-neuronal cells (Verstraelen et al., 2014). We did not observe separate waves from neuronal and non-neuronal cells in the neural spheroids' spontaneous calcium oscillation, indicating synchronization between neuronal cells and astrocytes. Moreover, the synchronization in neural spheroids was global, while the 2D cultured synchronization mentioned above was still regional (Verstraelen et al., 2014, Kuijlaars et al., 2016, Ryan et al., 2016, Sirenko et al., 2019. In the present study, we monitored live cell spontaneous calcium burst in vitro and showed that the neuronal activity changes induced by pharmaceuticals can be detected by calcium imaging.

Tab. 4: Estimate of βi fit in the multivariate logistic regression model and the associated standard error, z-value, and p-value
A logistic regression model trained on the clinical compound set demonstrated utility in predicting neurotoxicity based on the integration of the 7 endpoints. Expression of the human fpC max concentration relative to the EC 50 values for the 7 endpoints enabled determination of an MOE (Tab. S3 1 ). The initial goal of the analysis was to identify an optimal threshold of EC 50 and MOE values that would best separate the known neurotoxicants from those without clinical neurotoxic risk. It became clear that this metric would be highly dependent on the compound set employed and concentration ranges achievable in vitro, and ac-pounds to further evaluate this in vitro model in drug candidate selection for preclinical studies in the neurotherapeutic realm. The drug candidates have never been tested in clinical settings and lack human exposure data. The compounds were not identified as toxic in the traditional pharmaceutical assays that assess endpoints such as cytotoxicity and mitochondrial dysfunction. Instead, they were prioritized to be tested in preclinical species over other candidates for the same indication. Table 5 lists the 33 drug candidates and their preclinical neurotoxicity classification, the species in which they were found to be neurotoxic, and the EC 50 values for the 7 endpoints. The drug candidates covered 15 unique targets that spanned both the CNS and non-CNS therapeutic area. There was no overlap of the Takeda preclinical compound set targets with hits in the clinical compound set. In addition, 8 targets had only 1 compound hit, and the others had more than 1 compound hit. We considered compounds as neurotoxic positive in the preclinical studies if seizures or convulsions were observed in at least one of the species tested. If a compound did not cause seizures or convulsions in any of the species tested, it was classified as neurotoxicity negative. However, this definition is limited by the lack of clinical data and the limited preclinical studies conducted. Thus, it is difficult to determine neurotoxicity negative with certainty. Out of the 33 compounds, 23 were reported to be neurotoxic in vivo with many demonstrating species-specific seizure liabilities. Columns 7-13 of Table 5 show the EC 50 values of each in vitro endpoint. The threshold for classifying a positive compound in vitro was EC 50 < 30 µM. The last column shows the overall model prediction. 20 of the 23 in vivo positive compounds tested positive in the in vitro assay. The sensitivity of the neural spheroid model with the Takeda preclinical compound set was 86.96%. Out of the preclinically negative compounds, 5 tested negative in vitro in agreement with the in vivo findings, while the other 5 tested positive.

Discussion
A major challenge associated with identifying neurotoxicity risks in drug discovery is the poor concordance between in vivo studies in preclinical species and human-related neurotoxicity (Olson et al., 2000;Monticello et al., 2017). Major efforts have been undertaken to improve prediction of potential neurological AEs during drug discovery and development. These efforts include, but are not limited to, development of imaging and fluidic biomarkers and incorporation of more sophisticated human in vitro cell-based models with multi-parametric endpoints in 2D or 3D format, with many platforms incorporating human iPSC (Walker et al., 2018). Although many neuronal cell-based models have been developed and demonstrate great potential for predicting neurotoxicity, most of these models have not been assessed in the context of drug discovery (Lancaster et al., 2013;Lancaster and Knoblich, 2014;Schwartz et al., 2015;Kilic et al., 2016;Qian et al., 2016Qian et al., , 2018Sirenko et al., 2019). Table 6 summarizes the performance assessment of some of the in vitro models using cutting-edge technologies in recent years. Some of the stud-

Tab. 5: Neurotoxicity classification of Takeda preclinical compounds according to preclinical studies and neural spheroid prediction, with in vivo results and in vitro EC50 values of 7 endpoints
The threshold for classifying a positive compound in vitro was EC50 < 30 μM in any of the seven endpoints. In the "Overall in vitro assay prediction" column, cells highlighted in red show where the in vitro prediction was not consistent with the in vivo results; green shows where the in vitro and in vivo results were consistent. The minimum and maximum fpC max values were also reported for rotigotine and vigabatrin. The maximum over minimum ratios for both tpC max and fpC max were 4x10 9 . High exposure was also reported for painkillers like ibuprofen, acetaminophen, and naproxen with values greater than 90 µM with a low risk of neurotoxicity. These high exposure values are outliers contributing to the relative low specificity of exposure as an indicator for neurotoxicity. A wide range was also observed when fpC max was normalized by neural spheroid EC 50 values. This could explain the relatively high p-value for each coefficient in the multivariate logistic regression model (Tab. 4).

Com-
Due to the lack of clinical exposure data, we could not apply the logistic regression algorithm to the Takeda preclinical compound set. We employed an empirical EC 50 < 30 μM as a threshold to define a neurotoxic positive detected by the assay. One advantage of working with the internal compounds was that we could recruit structural analogues in the screening and examine if the model could differentiate the subtle structural differences. For example, TKD13 is the inert analogue of TKD14, with both targeting T6. TKD14 resulted in seizure in rat while TKD13 did not. The neural spheroid assay was able to distinguish between matched pairs of drugs, with the EC 50 values for TKD13 16-times higher than for the seizurogenic compound TKD14. In particular, the findings highlight the capability of 3D neural spheroids to distinguish neurotoxic from non-neurotoxic compounds with structural similarity, which can greatly help medicinal chemists to generate structure-activity relationships (SAR).
The nonclinical safety packages showed high sensitivity in identifying human neural liabilities in clinical compounds (75%, Tab. 3D). Our analysis integrated nonclinical safety data from multiple species to identify concordant neurotoxicity in patients more holistically, which may contribute to the high true positive concordance. In line with our findings, Olson et al. (2000) reported that the combination of results from rodent and nonrodent species highly improved the true positive concordance rate over that of rodent or nonrodent alone. In another study where species differences were considered, Monticello et al. (2017) reported 74%, 69% and 63% sensitivity to detect human nervous system risk utilizing rodent, dog and NHP, respectively, when using a nonclinical to clinical translational database with 182 molecules to determine the translation between animal liabilities to firstin-human clinical risk. Although the nonclinical safety package demonstrated a higher sensitivity than neural spheroids in identifying human liabilities, neural spheroids showed a much higher specificity than the nonclinical safety package (93.3% vs 30.4%), which indicates a very high false positive rate from animal testing. In contrast, Monticello reported a higher specificity from rodent and nonrodent species using the nonclinical to clinical translational database, largely because of the dramatic difference in compounds selected in the two studies. In the Monticello study, cordingly on the number of EC 50 or MOE values obtained from the test set for each assay. As such, we presented the predictive value of neural spheroids based on MOE fitting in a logistic regression model for external compounds with known human exposure data as well as practical thresholds of EC 50 values for internal compounds without human exposure data that would likely be implemented in drug discovery at different stages of lead-optimization. Across the validation, the neural spheroid model exhibited high specificity and midrange sensitivity in distinguishing between known clinical neurotoxic compounds and non-neurotoxic compounds. These results and the imbalance between neurotoxic and non-neurotoxic compounds in the validation set could be indicative of over-prediction by the model and warrants future assessment against a larger, more balanced compound set when available. Using the practical EC 50 cutoff values on the Takeda preclinical compound set, the neural spheroids showed improved sensitivity in distinguishing positive from negative neurotoxicants. Proctor et al. reported that prolonged drug exposure of a 3D human liver microtissue enhanced sensitivity of the detection of liver toxicants versus short-term culture (Gunness et al., 2013;Valdivia et al., 2014;Frank et al., 2017;Proctor et al., 2017). In the present study, we observed that long-term exposure of the spheroids to the compounds helped improve sensitivity for detection of decreased cell viability via ATP (72 h vs 144+ h, Fig. S4 1 ) but did not enhance sensitivity for detection of calcium oscillation (30 min vs 6 h timepoints).
Both Proctor et al. (2017) and Shah et al. (2015) demonstrated that tpC max alone was a good predictor for potential DILI risk. The sensitivity and specificity of distinguishing DILI positive from DILI negative compounds reproduced with a similar tpC max threshold in their studies even though they tested different compounds. Inspired by their results, we analyzed whether tpC max and fpC max are also good predictors for drug-induced neurotoxicity. We found that fpC max performed slightly better than tpC max as a predictive indicator for neurotoxicity, but both showed < 70% specificity and sensitivity, not comparable with the reported higher sensitivity and specificity observed in predicting DILI. One potential explanation might be that liver is a highly perfused organ and serves as the primary site for metabolism and clearance of xenobiotics. High drug exposure therein (as parent and/or metabolites) can potentially interfere with its function and increase the risk for injury. Another potential explanation for the relatively low predictivity of exposure as an indicator for neurotoxicity might be that the drug concentration in plasma may not be identical to the concentration in brain, as neuronally active drugs must penetrate blood-brain and blood-cerebrospinal fluid (CSF) barriers. Maurer et al. (2005) examined the unbound fractions of 31 CNS-active drugs and 2 active metabolites in mouse brain and plasma. They found that most CNS-active agents freely equilibrate across the blood-brain and blood-CSF barriers such that unbound drug concentrations in brain approximate those in the plasma. The slightly better predictive value of fpC max in our study is in consensus with Maurer's finding that fpC max is a better indicator of brain exposure for neuronal active reagents. In addition, simple statistical analysis to test coculture seizure compounds of hiPSC-derived neurons and astrocytes lack of activity of BIA 10-2474 in neural spheroids. Moreover, the alternation of the lipid network by BIA 10 2474 may not manifest as calcium oscillation or ATP changes. Therefore, it is highly desirable to expand and validate more endpoints to cover various mechanisms of neurotoxicity. Different from the extensive investigation to understand the potential mechanism driving BIA 10-2474's neurotoxicity, there is little or no information available regarding the off-target profile of other compounds in our clinical compound set, which makes it impossible to determine if the false negativity is due to lack of off-target expression in the neural spheroids. Understanding the promiscuity of the compounds tested in our model will provide more information to understand the predictive value of the spheroid model. Future iterations of this model may also benefit from having a more diverse array of iPSC donors to better understand how variability in the population may contribute to neurotoxicity.
In conclusion, the neural spheroids demonstrate high specificity, while maintaining mid-range sensitivity in the detection of human clinical neural liabilities. This investigation suggests that the high-throughput neural spheroid assays may be used as an effective early screen for potential drug candidates before in vivo safety studies are performed. This will provide project teams with early indicators of potential neurotoxic liabilities and reduce the number of animals exposed to potentially neurotoxic drugs. Measuring the alternation of calcium oscillation to report electrophysiological changes following exposure to seizurogenic drugs demonstrated in principle that this 3D neural model has the potential to recapitulate in vivo findings in vitro. Taken together, the data produced in this comprehensive evaluation support that 3D neural spheroids have a low false positive rate in identifying clinically relevant neurotoxicants when measuring calcium oscillation and cytotoxicity as endpoints. This is an important finding considering that neurotoxicity remains a major source of safety attrition for drug development and post-market withdrawal of drugs. Therefore, our study supports the use of neural spheroid models to aid with neurotoxicity hazard identification in drug discovery.

References
Accardi, M. V., Huang, H. and Authier, S. (2018). Seizure liability assessments using the hippocampal tissue slice: Comparison only 25% of the compounds were CNS therapeutics, while the majority of the compounds in our study were for CNS indications. Looking at other concordant parameters, LR+ of neural spheroids was 4.7 times higher than in nonclinical safety studies, indicating increased odds of a positive clinical finding predicted by a positive signal from spheroids than a nonclinical safety finding to predict neural liabilities in patients. LR+ of neural spheroids was higher than that of rodent and dog reported in the Monticello study, but lower than NHP. Although the neural spheroids demonstrated high predictive value in identifying neurotoxicants, the logistic regression model failed to properly classify approximately 50% of the neurotoxic compounds tested. In particular, 20 out of 43 compounds were false negative in the logistic regression model even when correcting for clinical exposure. This is not surprising given the diverse physiological mechanisms and inter-individual variability known to play a role in clinical neurotoxicity. For example, the medical application of antibiotics may trigger epileptic seizures or status epilepticus by decreasing inhibitory transmission in the brain (Czapinska-Ciepiela, 2017). The seizurogenic antibiotics enoxacin, cefazolin and isoniazid tested in our study turned out to be false negatives. Interestingly, a previous study conducted by Easter et al. (2009) found that only the brain slice assay, out of four preclinical seizure assays, was capable of correctly annotating these three antibiotics. Only 1 or 2 tested antibiotics tested positive in a zebrafish model, mouse spontaneous seizure model, and mouse EEG model (Easter et al., 2009). Another example that is worthy of discussion is BIA 10-2474, an experimental fatty acid amid hydrolase (FAAH) inhibitor. It caused mildto-severe neurological symptoms in the clinic. In our study, BIA 10-2474 demonstrated no effect on neural spheroid calcium oscillation, and there was no alternation in cellular ATP at the maximum concentration tested. Activity-based protein profiling revealed that this compound is a promiscuous lipase inhibitor that may cause a metabolic dysregulation of the nervous system resulting in the neurotoxicity observed in the clinic (van Esbroeck et al., 2017). Closely examining the expression of primary target and potent off-targets in neural spheroids demonstrated that FAAH and PNPLA6 are 2-fold and AHBD6 is 4-fold less expressed in neural spheroids than in human brain tissues. FAAH2, CES2 and ABHD11 showed no difference. The lower expression of primary and potent off-targets in spheroids may explain the