Optimizing Drug Discovery by Investigative Toxicology: Current and Future Trends

Investigative toxicology describes the de-risking and mechanistic elucidation of toxicities, supporting early safety decisions in the pharmaceutical industry. Recently, investigative toxicology has contributed to a shift in pharmaceutical toxicology, from a descriptive to an evidence-based, mechanistic discipline. This was triggered by high costs and low throughput

to satisfy the specific requirements of individual regulatory authorities (OECD, 2005).
This positive development of streamlined testing strategies, however, also brought some challenges with it. Many companies are reluctant to integrate new technologies or assays next to the established safety assessment consisting of GLP in vivo studies due to the perception that non-GLP, or not fully validated assays, might compromise the pivotal studies and endanger the approval process. The high standard of harmonization and validation developed for the GLP studies is often requested for new technologies and assays. On the other hand, attrition due to safety reasons in preclinical and clinical drug development phases still represents a major factor for the overall loss of projects and therefore there is pressure to improve the predictive power of preclinical studies, including new screening strategies. In addition, the high costs and the rather low throughput of GLP in vivo studies and the intensifying demands to address the 3Rs has increased the push towards new screening strategies (Sewell et al., 2017).

Introduction
Tremendous progress in preclinical development across the pharmaceutical industry has been achieved over the past three decades. This pivotal phase, which prepares the transition into firstin-man trials, has been strongly harmonized under the umbrella of the International Conference for Harmonisation (ICH) of Technical Requirements for Pharmaceuticals for Human Use 1 (Ohno, 2002). The ICH has contributed to an internationally accepted set of submission-relevant guideline documents generally concerning in vivo drug safety studies, which are based on Organisation for Economic Co-operation and Development (OECD) test guidelines for the individual study conduct and are strongly connected to OECD documents for Good Laboratory Practice (GLP). The whole framework of harmonization has led to an increase of mutual acceptance of preclinical submission documents in the three regions involved (European Union, the United States, and Japan) and, as a consequence, to elimination of studies performed

Fig. 1: A visual illustration of the continuum of investigative toxicology in the drug discovery and development pipeline
Plain arrows represent the forward feed information to move to the next step, while dashed arrows represent back feed of knowledge to improve predictivity. Organ-on-chip, 3D tissues and MPS have the potential to complement and, perhaps to some extent, replace certain steps of research and development. FiH, First-in-Human trial; MPS, microphysiological systems; M&S, modeling and simulation maceutical industry point of view, the term "investigative toxicology" can be defined as a complementary effort to regulatory toxicology, encompassing both a prospective (screening for de-risking) and a retrospective approach (mechanistic investigations of adverse effects) (Moggs et al., 2012).
To foster awareness, development, and implementation of investigational toxicology and to share expertise, knowledge, and best practice in a pre-competitive space, a group of European-based investigative toxicology leaders from the pharmaceutical industry (see Fig. 2 for participating companies) founded the Investigative Toxicology Leaders Forum (ITLF) (Roth, 2017). This open, non-exclusive forum aims to enhance interaction with experts from academia and regulatory bodies in the field of investigative toxicology. The objective of the ITLF is to elaborate robust, reliable, and accepted investigative toxicology concepts and practices for decision-making related to early safety-related attrition, de-risking, and mechanistic elucidation of effects as shown in Figure 3. The figure illustrates how investigative toxicolo-As a consequence, most pharmaceutical companies have established specific toxicology functions, which complement the experimental GLP functions. Some companies have even gone so far as to fully outsource GLP activities and to focus in-house on preclinical safety activities on what is termed "discovery", "exploratory", "mechanistic", or "investigative" toxicology. While the tasks and organizational set-up of these functions differ from company to company, it has become evident that the value of these activities lies not only in screening assays preceding regulatory activities, but also in an enhanced understanding of the mechanism of toxicity, which is equally relevant for later phases of clinical development. In fact, this is shifting pharmaceutical toxicology from a purely descriptive to an evidence-based mechanistic discipline. For this reason, the authors of this publication prefer the term "investigative toxicology" over "discovery toxicology", since it avoids the perceptional limitation to serve only the early phases of safety assessment. The continuum of investigational toxicology in the drug development process is illustrated in Figure 1. Thus, from a phar-

Fig. 2: Companies participating in the pre-competitive Investigative Toxicology Leaders Forum (ITLF) as of July 2018
Objectives of the ITLF are to jointly elaborate robust, reliable and accepted investigative toxicology concepts for decisionmaking for early safety-related attrition, de-risking, and mechanistic elucidation of safety-related effects to increase the understanding and improve the translation of in vitro to in vivo mechanistic data. Furthermore, the adoption of new technologies/platforms into the drug discovery back-bone is targeted by the forum to increase the knowledge and awareness of investigative toxicology as a discipline (e.g., through publications, meetings, and conferences).

Fig. 3: Key objectives of investigative toxicology during drug discovery and development
es to developing testing strategies (Brennan et al., 2015;Dixit and Boelsterli, 2007;Bussiere et al., 2009). The following outlines the perceived gaps in identifying NCE/NBE hazards for target-organ toxicities, limiting risk assessments and prediction of human safety, mitigation strategies to manage risk, and current governance for investigative toxicological sciences. The mechanisms of ADRs are extensively reviewed elsewhere (Atienzar et al., 2016;Hornberg et al., 2014a,b).

Target organ toxicity models
Although 70% of human-relevant toxicities are detected in experimental species (Olson et al., 2000), the translational relevance of these toxicities is highly dependent on the affected target organ. Significant human ADRs are predominantly associated with liver, heart, and neurological organs (Cook et al., 2014;Olson et al., 2000;Sacks et al., 2014). The detection of dose-dependent drug hepatocellular cytotoxicity by in vitro cell-based models and animal studies is well accepted (Antoine et al., 2013;Ward et al., 2014), yet the multifactorial nature of drug induced liver injury (DILI) and known species differences are notable gaps that require the development of humanized models to detect liver injuries associated with immune or patient-specific susceptibilities. For the identification of cardiovascular drug liabilities, the concomitant use of both in vitro models and animal studies is well established (Laverty et al., 2011;Valentin et al., 2010). Whereas effects on ion channels can be easily identified by in vitro (e.g., patch clamp) models, the complex interplay between heart rate, ejection volume, and blood pressure eventually causing heart or kidney damage can currently only be assessed by in vivo models. Similarly, early in vitro prediction of neurological ADR is still challenging as many side effects can often only be detected in clinical trials since they are caused by interactions with rare targets or occur only after chronic administration, which is difficult to achieve in in vitro assays (Schmidt et al., 2017). Regarding other toxicities such as hematologic or hematopoietic disorders or carcinogenic risk, few in vitro models exist due to the nature and complexity of the underlying pathology (see Tab. 1). However, there is a surge in the development of organotypic and microphysiological systems (MPS) (Marx et al., 2016) including multiple organ systems. High expectations are placed in improved detection of drug liabilities for use in safety assessment by the use of these innovative three dimensional (3D) models (Hardwick et al., 2017;Lin et al., 2015;Mueller et al., 2014;Soldatow et al., 2013).

Disease models
Disease models are required to emulate organ-level functions and recapitulate key phenotypic features of human disease in cell or tissue-based as well as conventional and transgenic animal models. Disease status can impact considerably on the toxicity of substances and thus the target population of a novel drug candidate. Nevertheless, animal models established to reflect human disease often appear to have had limited success (Benam et al., 2015;Morgan et al., 2013) and likely contribute to the poor predictivity of efficacy and safety of drugs in later human clinical trials. The future incorporation and use of humanized in vitro disease models gy adds to the traditional drug development process. Investigative toxicology supports the entire process by early assessments of target-and chemical class-related toxicological concerns and front-loading of assessments as prospective risk anticipation. Furthermore, alerts from later stages of development and market surveillance can trigger a retrospective de-risking process, which will typically include the elucidation of toxic mechanisms to assess their relevance to humans and possible mitigation strategies.
The goal of investigative toxicology is to improve preclinical decision-making, which coincides with the notion of animal-free safety testing. Currently, many compounds are ruled out based on results from animal models obtained during the preclinical phase without knowledge of how the compounds would behave in humans, i.e., the false-positive rate of animal studies cannot be assessed. In addition, significant attrition occurs in clinical phases due to safety issues that were not adequately identified during the preclinical phase (false-negative) (Waring et al., 2015). Progress in investigative toxicology towards humanized in vitro test systems promises a better rate of human-relevant predictions.
For this reason, the ITLF teamed up with CAAT-Europe to hold an "Investigative Toxicology Think Tank". in July 2017, which assembled 34 experts from academia, the pharmaceutical and other industries, regulatory authorities, and technology providers to develop a definition of "investigative toxicology" and to align academic and expert stakeholders with the needs for a predictive and mechanistic investigational toxicology. Although the focus of the meeting was on investigative toxicology in drug development, progress in this field may also influence safety assessment in other industry sectors (industrial, consumer or agro-chemical compounds). This report represents a position paper for investigative toxicology based on the topics of and discussions during the workshop. It starts with a gap analysis, followed by a critical assessment of new technologies, and finishes by summarizing challenges, and presenting perspectives and recommendations.

Gap analysis
The pharmaceutical industry has made substantial efforts towards the implementation of in vitro based models, which has improved the hazard identification and risk assessment of drug candidates prior to non-clinical development (Hornberg et al., 2014a,b;Goh et al., 2015). However, much remains to be accomplished to address the substantial gaps in our mechanistic understanding of adverse drug reactions (ADRs) and to support the development of biomedical tools that are truly predictive of inter-individual human susceptibility to ADRs. The rapid "design-make-test-analyze" cycle time in drug discovery also places greater emphasis to further the understanding of the mechanisms of toxicity and chemical liabilities, and to facilitate the decision-making processes on candidate selection and development of new chemical entities (NCEs) and new biological entities (NBEs). In addition, the increasing diversity of biopharmaceutics, which now include cell and gene therapies, chimeric antigen receptor T (CAR-T) cells and vectors, antibodies, and anti-sense oligonucleotides presents new and significant risks, such as cytokine release syndrome (CRS) and tissue cross-reactivity issues, with a variety of new challeng-change on which to establish a safety margin (Dorato and Engelhardt, 2005). In the absence of well-characterized safety biomarkers 4 with clearly defined mechanistic and translational relevance to humans, preclinical findings in animals will, at best, only permit rough estimates of the safety margins. The pre-requisite for non-clinical safety testing is to select candidate drugs with large safety margins to improve the likelihood of clinical success. However, in vivo toxicological and clinical findings can result in unexpected and reduced safety margins in target organs during the clinical phase. Progress towards the identification of novel, sensitive biomarkers with mechanistic and translational relevance may help to improve the monitoring of drug safety profiles.
into toxicity assessments has the potential to concomitantly facilitate pharmacological discovery and safety evaluation of drugs (Hübner et al. 2018). This will improve the identification of safety margins with the potential to extrapolate phenotypic differences in patient populations and lead to mechanistically-driven safety margins in patient populations.

Safety margins
Dose limiting toxicity and the "no adverse effect level" (NOAEL) define safety margins 2 and toxicological profile for risk-benefit assessment of a drug candidate 3 . However, the NOAEL is often a subjective assessment of a biochemical or histopathological 2 In other, non-pharma sectors, the common expression is "margin of safety" (MoS). Instead of NOAEL, which is the highest experimental dose in an in vivo study that is without observable adverse effect, the Benchmark Dose Lower Confidence Limit based on benchmark dose modelling (BMDL) is more frequently used in these sectors. 3 http://www.fda.gov/downloads/drugs/guidances/ucm078932.pdf 4 A biomarker is a defined characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions. Molecular, histologic, radiographic, or physiologic characteristics are types of biomarkers. Safety biomarkers are applied to indicate the likelihood, presence, or extent of toxicity as an adverse effect (for definitions see: https://www.fda.gov/Drugs/DevelopmentAp-provalProcess/DrugDevelopmentToolsQualificationProgram/BiomarkerQualificationProgram/default.htm)

Tab. 1: Categories of safety attrition challenges
In vitro models good for detection and prediction Carcinogenicity of genetic toxicity but poor for carcinogenicity risk identification X, no model; ο, models yet to be evaluated for application in drug R&D; •, models routinely available/in use. 1, large species (dog or monkey); 2, rodent; 3, lagomorph identification of novel safety biomarkers with translational value from early in vitro safety assessment to non-clinical as well as clinical safety assessment. More recently, gene editing technologies such as CRISPR/Cas9 have allowed refined engineering of animal disease models. For example, pig models have been established for cystic fibrosis research, which are reported to be superior compared to the existing mouse models regarding their similarity to the human phenotype (Klymiuk et al., 2016). These technologies have the potential to bridge the gap between proofof-concept studies in animals and clinical trials in patients, thus supporting translational medicine.

Monitoring safety signals
Biomarkers permit identification and monitoring of potential safety signals (see Section 3.2) by employing a broad spectrum of technologies. Biomarkers detected by imaging and molecular techniques have advanced in recent years for on-target and off-target assessments. Despite a rapid rise in the use of omics technologies, several challenges remain before their routine adoption and application can be achieved (Khan et al., 2014). For example, genomic and pharmacogenomic screening have found use in clinical trial enrolment for an indirect assessment of drug metabolizing enzyme activity, yet the measurement of enzyme activity and drug-drug interaction (DDI) (Ward et al., 2014) only results in predictive values of around 40% when relying on protein and transcriptomic data alone (Weaver, 2001). In contrast, transcriptomics has yielded more success in the detection of organ-specific or selective pathologies , but nevertheless only appears to share similar sensitivity to that of established biomarkers . A further limitation to the use and implementation of omics is that they currently require invasive biopsies.

Data transparency
The conduct and design of experimental studies has often drawn criticism due to the incompleteness of published data and the lack of reproducibility of results. In addition, the lack of data standards, definitions, and ontologies represents a major hurdle for modelling and simulation exercises. However, the reuse and sharing of available public and private data, both within and across organizations, is progressively recognized as a valuable source of information for read-across, hazard identification, and risk mitigation. The described hurdles are increasingly addressed through data governance frameworks. These efforts towards harmonization of study design, data curation, and controls are more widely applied with public and public-private data repositories (Steger-Hartmann and Pognan, 2018). Pharmaceutical companies' decision-making processes increasingly rely on these data repositories to help support and complement internal research programs. EFPIA (European Federation of Pharmaceutical Industries and Associations) activities to facilitate data-sharing across companies will equally encourage high-value projects for cooperative data-sharing, which in turn are likely drivers towards greater harmonization of operating protocols and use and re-use of data in support of public health and drug research.
A better understanding of mechanistic toxicokinetics and toxicodynamics (TKTD) relationships in combination with pharmacokinetics and pharmacodynamics (PKPD) should establish improved quantitative monitoring of safety margins in non-clinical and clinical research.

Adverse outcome pathways and pathways of toxicity
The organization of mechanistic knowledge into temporal events includes pathways of toxicity (PoT) (Kleensang, 2014), mode of action (MoA), and adverse outcome pathways (AOPs) (Ankley et al., 2010;Burden et al., 2015;Villeneuve et al., 2014). An AOP describes a sequential chain of causally linked events at different levels of biological organization that lead to an adverse effect on human health. AOPs best define the qualitative organization of information, whilst PoT relates more to quantitative, dynamic, and molecularly defined systems. The application of AOPs with an understanding of mechanisms can help adopt novel biomarkers for use in the identification and monitoring of safety signals. Nevertheless, these are of limited value unless signals identified in in vitro and animal models can be linked to human ADRs through either target-based or phenotypic-based testing as weight of evidence (WoE) to facilitate improved risk assessment of human target organ toxicities.

Idiosyncratic and hypersensitivity reactions
Difficulties in the detection of hypersensitivity reactions and idiosyncratic toxicities arise due to ADR events that often occur already at low therapeutic dose levels in only small numbers of individuals during clinical development or post-registration (Pallardy and Bechara, 2017;Park et al., 2000;Uetrecht, 2013). The "non-existence" of relevant humanized pre-clinical models for early testing of drug candidates, coupled with the absence of clear dose-related toxicities and the complex dimensionality of immune-drug response necessitates urgent research to establish innovative diagnostic assays for drug discovery and continued efforts towards understanding mechanisms to support research and development of safer drugs. A successful example of such research is the specific case of hypersensitivity, namely skin sensitization, where the application of the AOP concept has led to a series of approved in vitro assays replacing animal studies (OECD, 2014).

Translational gap
Significant gaps remain on the path to achieving fully integrated and characterized humanized organ-specific panel(s) of in vitro models. Use of such models will require arrays of qualified mechanistic translational safety biomarkers, while dose (exposure) dependent toxicities will continue to rely on observational or phenomenological-based endpoint tests and WoE approaches to assess human drug safety (see above). Therefore, much work remains to be done towards establishing the next generation of in vitro models for target organ safety testing. This includes the in the Innovative Medicines Initiative (IMI) project, eTOX 5 . Predictive models were built to cover hundreds of clinical safety events linked to drugs and their pharmacological properties (Garcia-Serna et al., 2015). Linking different data sources (toxicity, on-target and off-target, drug metabolism and pharmacokinetics (DMPK)) using computational methods will allow toxicologists to go beyond traditional structural alerts and move towards an understanding of toxicity cascades. This could eventually contribute to AOP development and validation and ultimately to the interpretation of the underlying mechanism(s). The multifactorial origin of drug toxicity can thus be analyzed by combined approaches or network models to identify the causality of a toxic effect, ultimately shedding light on the likely mechanisms by which NCEs generate a safety risk.

Machine learning and artificial intelligence
A number of recent developments in quantitative pharmacology modeling have the potential to further embed these tools into an in silico drug development framework, thus contributing to an early assessment of drug candidates regarding the differentiation between on-target or off-target related liabilities (Murphy, 2011). The standardization and automation of the development of quantitative pharmacology models, together with their validation and reporting, will facilitate the acceptance and uptake of QSARs (Kausar and Falcao, 2018). As a compliment to the traditional QSAR models relating a chemical to a biological property, molecular docking models allow the rapid calculation of the binding potential of drugs to a target protein. Studies assessing the performance of commonly used molecular docking programs (e.g., Glide, GOLD, FlexX, eHiTS, PDBbind database) indicate that these programs can perform precise protein conformation, but their scoring functions are still too inaccurate for a reliable prediction across a variety of targets (Plewczynski and Klingström, 2011).

New technologies
The development of improved, innovative models for the detection of toxicity of drugs, industrial or consumer chemical products is crucial to efficiently bring new products safely to the market in a cost-effective and timely manner. Figure 4 illustrates some of the modern technologies going into investigative toxicology.
This non-exhaustive list of technologies -especially in combinations -encompasses a strong toolbox for mechanism-based human-relevant investigative toxicology approaches.

Overview
The prediction of mutagenic activity of new chemical entities (NCEs) based on their structure and potential reactivity towards DNA has been used for some decades, and in silico tools are now accepted for regulatory decision-making in the area of genotoxicity of drug candidates and impurities in pharmaceuticals (Amberg et al., 2014). Beside this, a lot of effort has been put into the prediction of organ toxicities, such as DILI, using different computational models (Kotsampasakou et al., 2017;Mulliner et al., 2016a), which achieved accuracies in the range of 70 and 80%.
New perspectives for in silico, read-across, and modeling approaches are resulting from the emerging availability of "big data" in toxicology (Hartung, 2016;Clark and Steger-Hartmann, 2018). One opportunity to push investigative toxicologists to embrace the 3Rs principles lies in developing new in silico approaches, and also in effectively integrating existing in silico tools with in vitro technologies, as well as with preclinical and clinical databases (Rovida et al., 2015), conceivably within an AOP-like framework (Tollefsen et al., 2014). For example, the in silico prediction of on/off-target liabilities was, in part, addressed AOP, adverse outcome pathway; GCCP, Good Cell Culture Practice; HTS, high-throughput screening; IATA, integrated approaches to testing and assessment; ITS, integrated testing strategy; MoA, mode of action; MPS, microphysiological systems; PBPK, physiologybased pharmacokinetic modeling; PoT, pathways of toxicity; (Q)SAR, (quantitative) structure activity relationships er, AOPs, especially quantitative AOPs, may also prove beneficial as a framework to build in silico tools and in vitro testing batteries for drug discovery (Hartung, 2017b). AOP networks based on shared KEs are in active development (Knapen et al., 2018). Systems biology models, such as neural networks, have been a focus in drug development but require comprehensive, complex tools for their quantification (Hartung et al., 2012. Although not formally applied thus far, toxicokinetic-toxicodynamic (TKTD) modelling (Tsaioun et al., 2016;Kretschmann et al., 2012) may prove to be a useful tool to quantify KEs. These models simulate processes leading to toxicity in organisms over time, where (a) uptake and elimination rate constants for an NCE/NBE in an organism are determined to estimate the time course of a toxicant at a target (e.g., molecular initiating events (MIEs)) and (b) damage accrual and recovery rate constants for an effect across biological scales are determined to estimate the time course of an effect.

QIVIVE and PBPK/PD
A quantitative understanding of the progression of biological events from MIEs to adverse outcomes allows us to derive tissue-specific points of departure (PoD) from organ-specific in vitro assays assessing perturbations of relevant KEs. The PoD is used to mark the beginning of extrapolation to determine the risk associated with expected human exposures. Quantitative AOPs will help answer what level of in vitro perturbation should be used as a PoD for quantitative in vitro to in vivo extrapolations (QIVIVE) (Hartung, 2017a). An understanding of the activity threshold is required that pushes the toxicity pathway onward from one molecular event in this pathway to the next and the internal dose of the drug or toxicant that affects the probability and severity of an event perturbation.
PBPK modelling is becoming indispensable for QIVIVE (Basketter et al., 2012;Leist et al., 2014). Specifically, reverse dosimetry PBPK is being used to estimate human exposures that lead to concentration-time profiles that are equivalent to sufficiently There have been significant advances in machine or deep learning technology in recent years. Although deep learning approaches have been shown to yield accurate predictions (Mayr et al., 2016), they require large, costly datasets. When it is practical to generate a relatively small dataset, researchers often seek to test a diverse set of compounds in their assay. Because of the complexity of compound space as well as the assay results within that space, diversity selection of compounds does not always yield an optimally predictive model. One solution to this problem is the use of transfer learning. With this approach, data from biologically similar assays can be used to predict one another. This allows for the effective expansion of chemical space for toxicities for which data are more limited (Kangas et al., 2014). The second solution to the problem of generating data for learning predictive models is the use of active machine learning (Murphy, 2011). In essence, a machine learning algorithm can be used to identify which tests will yield the most informative data. By focusing experimentation primarily on the informative experiments that yield the best data, far fewer experiments are needed to learn an accurate predictive model. In practice, these active machine learning approaches can significantly reduce in vitro and in vivo experimentation, while also increasing prediction accuracy, and they are not strictly limited in application to investigative toxicology.
The power of machine-learning approaches in drug discovery lies in its integration with network modeling (Fig. 5). A well-curated, comprehensive molecular interaction network can reveal causes and effects of protein interactions in signaling and metabolic pathways, thus allowing network-based screening to systematically identify target proteins of a drug and their impact (Hsin et al., 2013).

AOPs and their role in network models
To date, AOPs have been applied in the safety assessment of chemicals but less so in drug discovery. AOPs serve as a mostly linear concept to identify measurable key events (KEs). Howev-

Application and classification of safety biomarkers
Safety biomarkers for use in investigative toxicology fall largely into the category of "response biomarkers" (Amur et al., 2015). A drug liability identified early in discovery, which shall be monitored and ideally de-risked during non-clinical and clinical research, requires robust and reliable safety biomarkers that are of translational relevance to humans. The same biomarkers would conceivably also support monitoring of patient populations and positively impact therapeutic safety margins. As biomarkers provide valuable information on drug safety, they are increasingly integrated as part of drug discovery and non-clinical development. Establishing the use of novel safety biomarkers with target organ specificity and mechanistic insight for use in non-clinical (Blaauboer et al., 2012) and clinical studies nevertheless remains challenging.
Biomarkers include messenger and micro RNAs, proteins, metabolites, clinical chemistry (Brooks et al., 2017) as single endpoint measurements or as multiplexed processes in microarray and microfluidic platforms. Whatever the biomarker selected, preclinical confirmation on the comparative molecular biology, translational relevance of the mechanism of toxicity, target organ, and time course with known histopathology in humans is required for later qualification (Matheis et al., 2011). The classification of biomarkers as exploratory, probably valid biomarkers and valid biomarkers defines how biomarkers are applied in research and development (Chau et al., 2008). Classification of the increasing numbers of qualified biomarkers helps define how emerging and future biomarkers can be used to support decision-making and their acceptance by regulatory authorities (Edwards et al., 2016).

Safety biomarkers for the three key target organs
The development of safety biomarkers for the organs that contribute to the highest attrition, i.e., heart, liver, and CNS (central nervous system), has been pivotal owing to both the severity and occurrence of these target organ toxicities across many classes of drugs (Marrer and Dieterle, 2010). The progress towards the development of biomarkers among these three target-organ toxicities is highlighted below.
Heart (cardiovascular toxicity) Cardiovascular toxicities accounting for ADRs, drug attrition, and withdrawal relate to all components of the cardiovascular system (Laverty et al., 2011;Valentin et al., 2010) and can be broadly categorized into i) structural damage, ii) functional deficits with or without histopathological correlates, and iii) altered cell or tissue homeostasis in the absence of obvious structural or functional deficits (Wallace et al., 2004). The diversity of active concentration-time profiles in vitro (Louisse et al., 2017). Recent efforts in the US EPA ToxCast program 6 illustrate the integration of in vitro activity concentrations with reverse dosimetry PBPK for risk assessment. In vitro determined hepatic clearance and plasma protein binding parameterized a TK model to predict the chemical steady-state concentrations (Css) in plasma resulting from repeated daily exposure. Reverse dosimetry PB-PK tools were subsequently used to estimate human equivalent doses (in mg/kg/day) required to achieve blood Css levels identical to in vitro bioactive concentrations.

Big data
Besides the omics applications and the concomitant pathway analysis, future use of big data in safety science will encompass two fields, early compound (drug candidate) assessment and translation concordance analysis. On the one hand, mining of large preclinical data sets will result in automated read-across procedures (Hartung, 2016), which will enable the assessment of new chemical structures, including structural moieties, for their potential toxicity liabilities. Such tools will enable medicinal chemists to guide their hit-to-lead search, not only for criteria of pharmacophore, drug metabolism and pharmacokinetics (DMPK), and physico-chemical properties, but also for specific safety aspects, also termed "green toxicology" Crawford et al., 2017;Maertens and Hartung, 2018). An example of how such read-across approaches might be applied for optimizing drug candidate selection to reduce toxicity liabilities in early phases has recently been published (Steger-Hartmann and Pognan, 2018).
The other area of interest is the automated analysis of animal-human translation or concordance. Questions such as, "Tell me how an n-fold decrease in white blood cell count in species x at dose y corresponds with effects in humans?" with all subsequent ramifications (e.g. "Can results be grouped according to preclinical species, pharmacology, mode of action,…?") or "What is the most sensitive preclinical species for a specific organ toxicity?" can be approached by analyses of big data sets (Clark and Steger-Hartmann, 2018).
Big data analyses however require: − accessibility of large preclinical and clinical data sets, while safeguarding aspects of intellectual property and personal data protection − automated procedures for data curation − integration of controlled vocabularies and ontologies to enable cross-analyzing data − quality control of data by scientific experts Such efforts can only be achieved via consortia approaches and should be run in parallel to data sharing guidelines and principles. Examples of such initiatives are DruSafe 7 (Monticello, 2015), eTransafe 8 , or the initiative to make data "fair" (= "findable, accessible, interoperable, reusable" 9 ).

CNS (Neurotoxicity)
There is a need for more sensitive and specific biomarkers that can help diagnose and predict neurotoxicity that is relevant across animal models and can be translated to the clinic (Schmidt et al., 2017). Some traditional functional biomarkers with established non-clinical to clinical translational value, e.g., electroencephalogram, electroretinogram, and brainstem auditory evoked potential, can be used. Fluid-based biomarkers, such as miRNAs, F2-isoprostanes, translocator protein, glial fibrillary acidic protein, ubiquitin C-terminal hydrolase L1, myelin basic protein, microtubule-associated protein-2, and total tau, hold great potential due to the relative ease of sampling. However, some of these biomarkers (such as those in the cerebrospinal fluid) require invasive sampling or are specific to one disease such as Alzheimer's or Parkinson's disease, while others require further validation. In addition, neuroimaging methodologies may also provide potential biomarkers and, coupled with functional, genetic-and protein-based biomarker assessments, offer an exciting way forward to predict, detect, and monitor drug-induced neurotoxicity.

Future perspectives
Continued efforts towards the discovery and characterization of novel, sensitive and relevant biomarkers to effectively bridge in vitro to preclinical to clinical testing would strengthen our ability to predict, detect, and monitor drug-induced organ injuries (Park et al., 2000). The principal challenges ahead include the identification and qualification of these biomarkers for use not only as "response biomarkers" but as predictive of ADR outcomes and prognosis.

Novel cell models
Generating physiologically relevant models is a promising approach to improving our ability to detect and predict drug induced toxicity, as well as to unravel specific mechanisms of toxicity. Therefore, there is an increasing desire to move away from the use of cell lines that form part of screening cascades within the drug discovery process and towards primary cells with their known limitations (e.g., limited source, variability, etc.) (Eskes et al., 2017;Pamies et al., , 2018Coecke et al., 2007).
A robust, reproducible, and relatively "unlimited" source of cells with defined phenotypes and genotypes would greatly benefit the field of toxicity testing and assist in standardizing early investigational toxicological research (Pamies and Hartung, 2017). Differentiation of various types of human stem cells into the desired somatic cells might be a solution.
Moreover, introducing further complexity by culturing cells in 3D, microphysiological and organoid model systems is an approach that is gaining ground within the investigative toxicology community (Alépée et al., 2014). The idea is that such 3D and organoid models display more physiologically relevant attributes, including cell polarization, cell-cell or cell-microenvironment interactions (Anton et al., 2015;Duval et al., 2017;Retting et al., 2018) that are important drivers of tissue differentiation and function. Microfluidic and tissue printing techniques have been used to increase complexity of tissue models by add-ADRs necessitates a range of biomarkers to detect, predict, and monitor ADRs in non-clinical and clinical testing. Biomarkers of hemodynamic effects include monitoring of blood pressure, heart rate, and ejection fraction using semi-invasive approaches. Cardiac electrophysiological effects such as QTc prolongation or shortening, QRS widening, PR prolongation, arrhythmias such as torsade de pointes and ventricular fibrillation are detectable via an electrocardiogram. For some of these endpoints, predictive in vitro screens are well established, there is a good relationship between free plasma concentration associated with significant QT prolongation and torsade de pointes in the clinic and in vitro Ikr IC 50 values (Webster et al., 2002). More recently, safety testing in stem cell derived cardiomyocytes has been suggested as part of a new integrated risk assessment of pro-arrhythmic liability (Sager et al., 2014). Degenerative or inflammatory lesions can be monitored via body fluid sampling and measurement of N-terminal pro-brain natriuretic peptide (NT-proBNP), miRNAs, creatine kinase (CK), aspartate aminotransferase (AST), troponin, and pro-atrial natriuretic peptide (pro-ANP) / brain natriuretic peptide (BNP) in both non-clinical species and humans. Although, numerous biomarkers of drug-induced cardiotoxicity have been proposed and are being used, some lack sensitivity and/or specificity, therefore the quest for mechanism-based cardiotoxicity biomarkers is continuing.

Liver (hepatotoxicity)
DILI represents one of the most significant ADRs. Attrition of promising drug candidates due to DILI occurs in preclinical and clinical development (Clarke et al., 2016;Pognan, 2018). DILI is classified as either intrinsic, with clear dose-dependent hepatocellular injury (Corsini et al., 2012), or idiosyncratic with low incidence rates in humans, which cannot be predicted with current in vitro and in vivo tests. The phenotypic assessment of DI-LI in patients relies on measurement of alanine aminotransferase (ALT), AST, alkaline phosphatase (ALP), and bilirubin (BIL). Despite wide acceptance, ALT, AST, and ALP are not specific measures of liver injury and detection of BIL occurs after extensive liver injury has occurred (Church et al., 2018). In non-clinical testing, detection of DILI relies substantially on histology (Weaver et al., 2017).
The use of ALP, ALT, and BIL as biomarkers is insufficient for the detection of human DILI and predicting outcome. Efforts to improve upon these liver safety biomarkers have yielded promising, novel biomarkers with additional mechanistic information: High mobility group box 1 (HMGB1) for detection of necrosis (Scaffidi et al., 2002) and its acetylated form in immune DILI (Lu et al., 2012). The value of these and other novel biomarkers, such as keratin-18 and miR122, are presented in detail elsewhere (Antoine et al., 2013;Clarke et al., 2016;Ward et al., 2014). These novel biomarkers are best defined as "response biomarkers" and further work is encouraged to extend knowledge towards their translational and predictive value as qualified biomarkers of DI-LI (Matheis et al., 2011). The prospect of translationally relevant safety biomarkers for use in the prognosis of DILI outcomes in patients is encouraging (Ozer et al., 2008(Ozer et al., ). 2017, where the most advanced example is probably the CIPA (comprehensive in vitro pro-arrhythmic assay) initiative of the US FDA (Wallis et al., 2018).
Fully functional organ-specific cells derived from iPSCs will become a valuable tool for drug development or evaluation of the contribution of genetic variation to variable responses. Moreover, the technology provides a unique opportunity to distinguish between gender, ethnic background, and potentially even disease background. The field is progressing rapidly, with varying levels of limitations still remaining. However, even though the introduction of induced pluripotent or embryonic stem cells for toxicological and pharmacological studies seems inevitable, efforts towards standardization, validation, and regulation are still necessary in order to make them a widely accepted option for toxicological and pharmacological studies.

Organoids
Organoids are a recent paradigm in tissue culture, with culture conditions assuring preservation of the (adult) stem cell niche, while proliferation and differentiation to the essential cellular subtypes of a specific organ still occur. For example, intestinal organoids accurately predict therapy response in cystic fibrosis and were used to establish living biobanks of tumor tissue that was genetically stable over time (Artegiani and Clevers, 2018). Whereas these intestinal organoids were expanded from primary human cells or human stem cells, pluripotent stem cells have also been used to generate organoids with impressively realistic in vivo-like microanatomy for the brain (Lancaster and Knoblich, 2014) and kidney (Takasato et al., 2015;Freedman et al., 2015). As of today, many organoid systems have been developed, including liver (Huch et al., 2013;Takebe et al., 2013), intestine (Spence et al., 2011), thyroid (Antonica et al., 2012), pancreas (Greggio et al., 2013), lung (Lee et al., 2014), and retina function (Nakano et al., 2012).
Whereas most of these systems are currently being used mainly in the context of basic developmental and stem cell research or disease modeling (Artegiani and Clevers, 2018), it is evident that these technologies are poised to play a role in the field of toxicology. What is required is a full and thorough evaluation of physiological and pharmacological characteristics of these organotypic models alongside human tissues to establish whether such models are "fit-for-purpose", i.e., improve the prediction of target organ toxicities (Carragher et al., 2018). The utility of these in vitro models can be enhanced by understanding the AOPs/PoT covered by the models of interest (Hartung and McBride, 2011;Kleensang, 2014;Hartung, 2017b).
There are obvious hurdles to overcome (Lancaster and Knoblich, 2014;Carragher et al., 2018): − artificial organoids currently mimic some, but not all, of the physiological functions of the respective human organs , − they lack physiological vasculature and, consequently, whole blood perfusion, which is essential to nutrient supply, waste transport, and several other physiological processes, including creating a dynamic microenvironment, ing aspects such as co-culture of multiple cell types, flexibility for compartmentalization and higher-order tissue architecture, flow, gradient formation, and mechanical strain. This increase in complexity ultimately leads to improved functionality and has been demonstrated for various types of approaches in hepatocyte models including 3D spheroids (Messner et al., 2013;Bell et al., 2016;Proctor et al., 2017), 3D printed systems (Retting et al., 2018), organoids (Huch et al., 2015), and MPS systems (Huh et al., 2010;Vernetti et al., 2016). Each of these systems has added value for biological relevance, although their routine implementation in toxicological testing remains to be established.

Stem cell models
Cell lines and primary cells have long been the main source of cells in cell-based experiments. Cell lines provide a relatively stable and continuous source of biological material, but are highly variable with regard to the level that they maintain the features associated with their tissue of origin. Cells that are freshly isolated from primary tissue are generally considered a gold standard for their physiological relevance. The time span over which these properties are maintained, however, is typically limited. Moreover, physiological properties may disappear under certain storage conditions, logistics around these primary cells are cumbersome, and the quality of isolated cells can be highly variable. Induced pluripotent stem cells (iPSCs) promise to be a renewable source of cells and could potentially provide large numbers of cells with well-characterized physiological properties and with genotypes that correspond to specific individuals. Today, various cell types are used for iPSC production, e.g., germ lines, liver cells, skin cells, and lymphocytes (Takahashi and Yamanaka, 2006;Yu et al., 2007;Gadue and Cotsarelis, 2008;Okita et al., 2007;Loh et al., 2009;Aoi et al., 2008). Various protocols that guide iPSC differentiation towards specific cell lineages have been published. Cell types including endothelial cells and smooth muscle cells, neuronal cells, cardiomyocytes, and hepatocyte-like cells can be differentiated with specific supplements and growth factors (Patsch et al., 2015;Hu et al., 2011;Mauritz et al., 2008). Phenotypes of various diseases, such as familial hypercholesterolemia, Wilson's disease and alpine-1-antitrypsin disease have been generated from iPSC derived hepatic cells, which could be used as cellular disease models (Cayo et al., 2012). Fundamental research on these cells could help our understanding of various disease types leading to the development of novel drugs.
Best established for toxicity testing are cardiomyocytes (Millard et al., 2018) and neurons (Wevers et al., 2016). The quality of hepatocyte differentiation is progressing. However, expression levels of xenobiotic metabolism genes in iPSCs are still not equal to those found in organs or freshly isolated primary cells. Other obstacles continue to impede progress towards using these cells for in vitro toxicology (variability in lines, incomplete programming within cell populations, uncharacteristic response to prototype toxicants, etc.).
Despite these limitations, iPSC derived cells are now suggested for use in toxicological screening and may provide an understanding of an individual patient's ADRs (van Hassselt and Iyengar, sue architectures from monolayer to 3D and artificial organoids. Because of their specialized microenvironment, they have recently been shown to be a tool that can enhance stem cell maturation (Sances et al., 2018;Ronaldson-Bouchard et al., 2018). While typically of limited throughput, higher throughput systems have been developed and applied for toxicity testing of 3D gut tubules and iPSC-derived neuronal models Wevers et al., 2016). Multiple organs can be combined on one chip (Wagner et al., 2013;Skardal et al., 2017) to investigate the mechanisms that drive organ toxicity at organ cross talk. Finally, the impact of biological feedback loops such as the insulin-glucose regulation of liver performance can be studied using the respective organ combinations (Bauer et al., 2017).
Organs-on-chips have also been developed to evaluate drug-induced toxicity (Esch et al., 2015). Organ-specific examples are the heart-on-chip (Zhang et al., 2015) and the lung-on-chip model developed by Huh and coworkers (Huh et al., 2010). A 3D bio-printed, cell-based mammalian skeletal muscle strip was successfully generated that is able to exert muscular force (Cvetkovic et al., 2014). In addition, three-dimensional bio-printed human models of liver, kidney proximal tubule, and intestinal tissue have been described for use in modeling native physiology and compound-induced toxicity (Nguyen et al., 2016;King et al., 2017;Madden et al., 2018). The progress in MPS hepatocyte culture systems (including non-parenchymal co-culture and bio-physical constraints such as oxygen tension) has led to additional improvement of tissue and organ level function (Vernetti et al., 2016(Vernetti et al., , 2017Lee-Montiel et al., 2017). 3D liver and neuronal spheroids − they lack key cell types, such as immune cells (resident or circulating), and neuronal innervation, − primary cell-derived artificial organoids face the shortage of human cell supply, − they lack in vivo relevant cellular architecture and cell-cell interactions, − they lack mechanical forces, − stem cell-derived artificial organoids replicate only the early stages of organ development, remaining "fetal-like" owing to lack of essential cues for final differentiation. Therefore, organoid systems still represent a trade-off between throughput and physiological relevance, and in many cases, the effects of a drug depend on factors such as metabolic competence or tissue specific distribution and interaction that cannot be achieved within single organoids.

Microphysiological systems (MPS)
Within the last 5 to 10 years, advances in microfluidic and micro-engineered technology has enabled the development of organ-on-chip models or MPS (Marx et al., 2016;Smirnova et al., 2018;Esch et al., 2015). By applying engineering principles, models can now be created that accurately represent the cellular microenvironment of an organ (Bhatia and Ingber, 2014). In doing so, cells theoretically retain their physiological phenotype and respond in comparable ways to their in vivo counterparts. Application of these models within the investigative toxicology and safety assessment process has been reviewed recently (Ewart et al., 2018). MPS are cell source agnostic and support various tis- Fig. 6: The current cell model landscape Traditional systems for evaluation of toxicity include cell line-and primary cell-based models. Developing technologies such as 3D organoids, bio-printed tissues, and single-and multi-organ MPS will result in models with greater biological relevance, for which full validation and routine implementation remain to be established. Human body-on-chips are still at an early research stage of development.
using such MPS-based personalized patient equivalents for studies to mimick Phase 1 and Phase 2 clinical trials.
Eventual success along this path may enable us to perform individualized studies mimicking clinical trials of a particular donor using statistically relevant numbers of almost identical replicates of donor or patient "bodies" on chips. This is in line with the current use of inbred, genetically identical laboratory animals in preclinical evaluation, with the difference that such miniaturized "bodies"-on-a-chip are of personalized human origin. Furthermore, it might allow head-to-head analysis of the outcome of the real donor or patient study with its body-on-a-chip counterparts. Finally, the use of "body" equivalents from donors and patients of different gender, ethnic groups, and genetic backgrounds will allow an evaluation of the impact of such parameters on the safety and efficacy of an NCE/NBE in the preclinical setting, which illustrates the high potential of such tools for the drug development cycle (Marx et al., 2016).

Imaging technologies
The past decades have seen enormous development and integration of high-content imaging in investigative toxicology departments (van Vliet et al., 2014;Uteng et al., 2014). The development of a variety of small molecule fluorescent probes allows the detection of numerous biochemical perturbations and live/dead endpoint measurements. For example, probes have been used to follow the accumulation of fatty acids in cells leading to steatosis (Germano et al., 2015), one of the critical endpoints of DILI. Fluorescent bile acids have been applied to visualize the accumu-have successfully been co-cultured on MPS for long term toxicity testing (Materne et al., 2015). Human intestinal organoids, liver spheroids, human skin biopsies, and monolayer proximal tubular cell barriers have been combined on a four-organ MPS platform for evaluation of systemic long term toxicity (Maschmeyer et al., 2015). Hepatic and cardiac cell types have been differentiated from iPSCs using MPS (Giobbe et al., 2015). It has been hypothesized that exposure of in vitro assembled premature iPSC-derived organoids to the physiological cues of an MPS, such as perfusion, shear stress, electrical stimulation, and organoid cross talk in interconnected arrangements, might constitute the missing step for their final and complete in vitro differentiation. First progress has been made to vascularize microfluidic systems (Schimek et al., 2013;Van Duinen et al., 2017). Figure 6 schematically illustrates the current cell model landscape and future perspectives discussed in this section.

Envisioned progress of in vitro models
The described progress in human iPSC generation at a robust large scale, their differentiation into a broad variety of premature organ-specific somatic cell based artificial organoids, and the steady increase in the number of organ equivalents on MPS platforms has created an historically unique opportunity for the introduction of humanized models in safety assessment (Miller and Shuler, 2016;Xiao et al., 2017;Edington et al., 2018). The combination of these three approaches may well lead to the establishment of personalized minute equivalents of a healthy donor-or a patient-on-a-chip. Figure 7 summarizes the long-term vision of transcriptome analysis, and sequencing becoming much cheaper, new applications of toxicogenomics are emerging and may require new attention and funding. In addition, improved bioinformatics tools that integrate large omics datasets into co-regulated gene networks, allow quantitative analysis of the association between such gene networks and adverse outcomes (Stiehl et al., 2017;Sutherland et al., 2018). There is a rapid development of sequencing strategies, where chromatin-immuno-precipitation (ChIP) sequencing will contribute to a further refinement of the transcription factors that drive these transcriptional networks in different target tissues. These complementary sequencing approaches should ultimately define the quantitative relationships between both safe and adverse ranges of pathway activation that will determine the safety margins. The IMI TransQST project will contribute to these quantitative systems toxicology evaluations 12 .
In concert with transcriptome analysis, sensitive proteomics platforms have also evolved that have allowed the analysis of cell and tissue proteomes under healthy and disease settings and after drug exposure (Cox and Mann, 2011). In particular, phosphoproteomics has allowed the assessment of early signals of cell signaling activation in relation to drug exposure (Pines et al., 2011). To date (phospho)-proteomics is not yet a common tool in drug safety assessment and investigative toxicology. However, integration of proteomics with transcriptomics has helped to gain a more precise understanding of drug action (Puigvert et al., 2013). Recent integration of biology information with proteomics has allowed the identification of drug targets of a large panel of kinase inhibitors (Klaeger et al., 2017). The integration of activity-based target profiling in investigative toxicology with the help of proteomics will further clarify the spectrum of off-targets of candidate drugs and contribute to an improved drug safety prediction.
Metabolomics is defined as analysis (identification and quantification) of active metabolites, including carbohydrates, lipids, and more complex bioactive molecules, such as hormones. Its role in toxicology is increasing (Bouhifd et al., 2013;Ramirez et al., 2013), fueled also by increasing quality assurance demands (Bouhifd et al., 2015). The metabolome can be determined in human and animal matrices (e.g., blood, plasma, urine, or sweat) with the focus on the entire body but also on organ-specific toxicity. Moreover, organ specific metabolomes for in vitro systems have been reported (Ramirez et al., 2013).
In parallel, targeted approaches for metabolomics have been developed, with increased sample throughput, enhanced analytical robustness, and facilitated data analyses. Targeted metabolomics carries the promise of a high translational potential for clinical studies. An example for targeted metabolomics is the application of multiplexed LC (liquid chromatography) MS/ MS methods for bile acid analysis (both unconjugated and conjugated) for the assessment of the cholestatic or steatotic potential of drug candidates (Schadt et al., 2016). While the metabolome analysis of plasma and urine requires animal testing, it is recommended to consider the 3Rs strategy (focus on reduction) and therefore include omics technology in animal studies. Hence lation of bile acids as a consequence of bile acid transport inhibition and may contribute to identifying compounds with a liability for drug-induced cholestasis (Germano et al., 2015). Likewise, fluorescent probes have been used for assessment of phospholipidosis (Morelli et al., 2006), oxidative stress, and mitochondrial membrane potential (Billis et al., 2014). For the pharmaceutical industry, this high-content imaging approach has become an essential tool within the field of predictive toxicology with the aim to design and prioritize drug candidates with a superior safety profile (Persson and Hornberg, 2016). While the technologies have primarily used 2D cell systems (either cell lines or primary cells; Pampaloni et al., 2007), the challenge for the future is to capture this in advanced 3D cell models and allow sufficient resolution for single-cell-based quantification of probe activity. Novel high-content imaging machines still have the limitation that they cannot capture the fluorescence of cells in the center of multicellular 3D spheroids. Challenges for the future are to bring lightsheet microscopy to the level of high-content screening and integrate this in screening labs (Joshi and Lee, 2015) to allow the detailed analysis of biochemical changes in complex MPS. Novel approaches involve phenotypic screening of cell morphologies, allowing the quantification of hundreds of (related) parameters in parallel (Joshi and Lee, 2015;Leary et al., 2018). Further challenges include the integration of other mechanistic biomarkers that would represent key events of AOPs into high-content imaging strategies.
Further advances in molecular imaging capability and deployment are also continuing through the development of label-free bio-imaging of tissues and cells (and potentially single cells and organelles) using mass spectrometry (MS) based approaches (Passarelli and Ewing, 2013).

Omics profiling
Omics technologies, which have gained prominence and relevance over the last few years, can be divided into four parts that focus on different steps in the generation of bioactive molecules: genomics, transcriptomics, proteomics, and metabolomics. Systematic studies on transcriptome analysis within large consortia and industry settings have helped establish extensive datasets of drug-induced transcriptome profiles in different target organs as well as in vitro in cells. This is exemplified by the TG-GATEs 10 and DrugMatrix 11 datasets that are available in the public domain (Igarashi et al., 2015;Ganter et al., 2005). The initial hope for toxicogenomics as the solution for the ultimate prediction of target organ toxicity has not been completely fulfilled as the technology and the diversity of transcriptional profiles has turned out to be more complex than anticipated (Pognan, 2007). While toxicogenomics-based predicted gene profiles have been established for several adverse outcomes, including genotoxicity testing (Ellinger-Ziegelbauer et al., 2009), a widespread generic application in drug safety testing has not been implemented. Generally, toxicogenomics is applied to support the mechanistic understanding of identified target organ toxicities. With RNA sequencing being the current major tool in words, is used in each respective context. Even when discussed by toxicologists within the pharmaceutical industry, "investigative toxicology" is used with different meanings. Hence the challenge that emerged at the beginning of the think tank was to find a common understanding of the keywords related to investigative toxicology. The need to harmonize descriptions of the keywords to enable easier discussion among all stakeholders became obvious. This is particularly important in the effort to implement investigative toxicology also into the regulatory process when discussing safety aspects with regulatory authorities.
As aforementioned and much discussed during the workshop, a holistic definition of investigative toxicology in drug development could be "the complementary discipline to 'regulatory toxicology' that includes all aspects of scientific investigations into drug candidates starting with screening and selection of tolerable compounds (prospective approach) and leading to the mechanistic elucidation of adverse effects observed in preclinical or clinical phases (retrospective approach)". Whether (or how) this definition can be translated to academia and other industry sectors will be seen in the near future. Such translation is of high importance since both approaches (prospective and retrospective) are currently in a transition phase, triggered by exponentially evolving testing options, e.g., complex human in vitro models, gene targeting, or enhanced screening possibilities. For all stakeholders, a common, more precise understanding of needs and objectives of these approaches is necessary to successfully align development efforts in academia and industry. Cooperative advancement is necessary to deliver comprehensive tools that can support all stakeholders according to their objectives.
Furthermore, the terminologies used in the context of investigative toxicology need to be harmonized since different (sub)disciplines may have a different understanding of key terms, including biomarkers, safety assessment, mode of action, off-target/on-target toxicity, hampering the focused integration of new technologies. But, also in the field of regulatory toxicology, ontologies of toxicological observations are not fully harmonized (Hardy et al., 2012a,b). How this can be improved was demonstrated in the ontology discussion within the IMI eTOX 5 project. Histopathology ontology was developed with the intent to standardize histopathology findings, and it has now been made openly available (Ravagli et al., 2017). This is a key enabler, making searches on one of the most important data types in toxicology studies possible and providing a database for new in silico and in vitro models 13 .
Further aspects that were found to be critical and challenging for an effective development of new tools (e.g., models, methods) include the definition of performance standards that lead to successful proof of concept studies. These need to be clearly defined to demonstrate the value of an evolving complex cell model or entire testing method for the respective objective. In turn, proof of concept studies may form the basis of the critical aspect of validation of a test method. The best case scenario is a formal validation that would open the door for regulatory use needs to be clearly defined to not only deliver reproducible results but to ensure the in vitro metabolomics is an important step towards avoiding animal studies and therefore supporting the 3Rs strategy (focusing on replacement).

Intracellular sensors
Perturbations of normal cell physiology leading to perturbations of biology on an organ or body level culminate in adverse outcome. There is a limited set of cellular perturbations that will drive this adversity. The establishment of AOPs and AOP networks, with the help of omics and cell biology, has already defined some critical pathways related to toxicity. This involves both biochemical perturbations as well as cellular disturbances that drive cell signaling and onset of adaptive rescue programs, or alternatively the onset of pathways that drive cellular demise. Cell biologists and toxicologists have taken advantage of this information and integrated biomarker genes tagged with fluorescent proteins that represent various adaptive cellular stress response pathways. Bacterial artificial chromosome genome editing technologies have allowed the green fluorescent protein (GFP) tagged expression of sensors, transcription factors, and downstream target genes of different cellular adaptive stress response pathways, including the Nrf2-mediated oxidative stress pathway, the p53-mediated DNA damage response pathway, NFκB-mediated inflammatory signaling, and the ATF4 and XBP1-based unfolded protein response (Wink et al., 2014). These fluorescent reporter cell systems can be integrated with high-content imaging and allow the dynamic analysis of stress response pathway activation (Wink et al., 2017). Wink and collaborators recently demonstrated the application of a panel of these reporters in the prediction of DILI using a panel of > 120 DILI compounds (Wink et al., 2018). A full coverage of cellular components that drive the adverse responses of drugs would contribute to the toolbox to evaluate drug safety.
Developments in cell and molecular biology have allowed the refinement of fluorescent protein probes based on fluorescent resonance energy transfer (FRET) (Ni et al., 2017). This allows the dynamic imaging of cell signaling activity. The integration of such tools in high-throughput microscopy setups would allow a refined understanding of the balance between cell adaptation and adversity and has been used in the safety assessment of drugs (Shuhendler et al., 2014).

Definitions, terminology and the need for ontologies
Toxicology as a subject is important in many industry sectors next to the pharmaceutical area, such as the chemical industry, cosmetics, consumer products, and food industry. Moreover, it plays an important role in environmental health. Since every sector discusses specific needs and issues, the terminology around toxicology varies widely. Therefore, it is not surprising how differently the term "investigative toxicology", together with its related key-included in a clinical trial of the respective patient. Whether these cell models still need to serve as organ toxicity models is debatable, since the concept of AOPs is not restricted to individual organs and in the best case an appropriate cell model may depict the fundamental toxicity mechanism in the body across different organs. However, the most important challenge for the AOP concept is the quantitative aspect (Hartung, 2017b). What is the cutoff value at which read-out or key event indicates a safety risk? In other words, for example, the pharmaceutical industry asks: "How can one define a safe human dose from an AOP investigation?" This question is still unanswered. Considering the importance of this aspect, investigative toxicology should strive to make this a powerful tool to answer this question.

Communication
The think tank discussed the need for "disruptive technologies" intensively. The meaning of "disruptive" in this context is a technology that has the potential to change the current toxicity strategies, which mainly rely on animal studies for risk assessment, entirely. "Big data", which has become a buzz word across very different industry sectors, sparks hopes of contributing to disruptive technologies, such as in silico modelling for precise prediction of specific organ toxicities (Mulliner et al., 2016b). However, valuable expertise of handling and connecting big data with relevant information lies outside the life science community (Haslehurst and Johnson, 2018). For example, technology and know-how of social network and internet companies may be useful to develop new strategies to translate toxicity data into meaningful context. Therefore, a suggestion is to bring together different industry sectors for new discussions about needs and opportunities that the internet and big data offer to toxicology. Examples for such new fields of collaboration are smart phone apps, which have found their use in the storage, tracking, and sharing of diagnostic data of diabetes patients, resulting in an improved self-care (Osborn et al., 2017). IT technology companies like Apple, Amazon, and Google have already made initial steps towards the life science and the healthcare sectors, e.g., by successfully applying their automated deep learning algorithms to diagnostic image analysis for the detection of retinopathy (Gulshan et al., 2016).

Qualification and validation
Over the last few years, many novel cell systems and technologies have emerged on to the market, most with little validation or supportive data. As a discipline, investigative toxicology needs to become proactive in finding the right balance between project-oriented issue resolution and ready to use/fit-for-purpose technology evaluation. Problem orientation also requires moving away from the formal requirement of validation, although important, as this often slows down the implementation of 3Rs methods, towards a "fit-for-purpose" 14 evaluation of new approaches. The ICH "Note for guidance on Non-clinical Safety Studies for the results are relevant and informative for hazard identification and risk assessment (Amur et al., 2015).
Many keywords that have emerged in the field in recent years are barely defined across stakeholders/sectors. For example, translational biomarkers and AOPs are frequently used in innovative publications (Antoine et al., 2013). However, at least in the case of safety biomarkers, the underlying understanding is still diverse, ranging from very specific (e.g., FDA approved kidney toxicity marker (Brott et al., 2014)) to broad use (e.g., gene expression pattern to define toxicity pathway (Ferrario et al., 2014)). Also, the OECD-introduced AOP concept describes the dissection of molecular toxicity pathways and defines key molecular events (OECD, 2017). But to make it a useful concept for risk assessment, hence allowing differentiation between adaptation and adversity across the different sectors, a common understanding is needed that finally leads to quantitative read-outs or thresholds.
Taken together, the think tank discussion, although initiated by one industry sector only, clearly showed the need to harmonize the use of key terms in the field of investigative toxicology. A common understanding of the relevant vocabulary was stated to be the basis to drive this discipline.

Study design
Before starting an investigative study or incorporating the investigative part into a regulatory GLP toxicology study, the expectation and the conclusions that might be drawn from the results should be anticipated, clearly described in a study plan, and scientifically justified. This places more weight on the scientific justification of the experimental design and may obviate the need for a full validation or description of the methods in regulatory guidelines. In case a model is more advanced, performance of multi-center studies to prove the robustness of the method is a key requirement. Thus, the community of investigative toxicologists together with the developers of the new models shall enable the pre-validation of these models.
For stand-alone in vitro studies, a fit-for-purpose validation (see Section 5) including scientifically justified negative and positive controls is key, whereas for experimental biomarkers, histopathological correlates may be sufficient. In the field of investigative toxicology, very often an iterative investigation cycle with multiple steps leads to the generation of a hypothesis and then to the generation of mechanistic data that support the mechanism of toxicity.
Another important aspect regarding the design of a preclinical safety study is the question whether to use disease models or not. Although disease models are not often used in animal toxicity studies, the underlying disease may be important to identify the most critical key events, as well as thresholds, leading to an ADR (Morgan et al., 2013). The technologies for human in vitro disease models as described in Section 2.1.2 as well as animal disease models humanized by gene editing technologies (see Section 2.2.3) will provide new translational approaches for the identification of the mentioned critical key events. This idea, combined with the aspect of human diversity, may eventually lead to the development of individual patient-derived cell models that may be 14 "Fit-for-purpose" is defined as a level of validation which proves that the assay or biomarker is sufficient for use in a particular defined context. It does not require a regulatory submission (for reference see footnote 13). tive method or tool has data to support its specific context of use in drug development (Ohno, 2002). The qualification concept also applies to the submission of biomarkers; however, in this case it is independent of the specific test or assay performing the measurement 15 .
Overall, while the terminology (i.e., qualification versus validation) differs and the processes evolve to keep pace with scientific progress and to benefit from it, the underlying purpose and principles of qualification and validation remain relatively constant as laid down in the modular approach to validation and in the OECD Guidance Document (Hartung et al., 2004;OECD, 2005). As described by Hartung et al. (2013), the validation procedure should follow fit-for-purpose approaches. A procedure that strictly follows the modular approach system, comparing in vitro data with animal data, still relies on the concept of one-to-one replacement. New concepts need to be developed, taking into account combinations of assays (IATA) and human data. A combination of in silico and in vitro might be the key to solve this problem.

Recommendation: the way forward
Investigative toxicology is a recent discipline in drug safety assessment that strives for a holistic view on properties of candidate molecules (NCEs/NBEs) and their predicted effect in humans by combining in silico, in vitro, in vivo, and clinical data and making use of innovative technologies and novel approaches (Fig. 8). Thus, it should be seen as complementary to regulatory toxicology, which is often confined by international guidelines. Investigative toxicology can embrace novel technologies more readily. Thereby, it is able to evaluate these technologies and produces evidence for their further use also in the regulatory context. Conduct of Human Clinical trials for Pharmaceuticals" 1 (ICH M3 (R2) states that "(…), consideration should be given to use of new in vitro alternative methods for safety evaluation. These methods, if validated and accepted by all ICH regulatory authorities, can be used to replace current standard methods." But what is "validated"? ICH makes almost no reference to validation, except for analytical methods (ICH Q2A, Q2B, M10). Validation is defined by OECD (2005) as "the process by which the reliability and relevance of a particular approach, method, process or assessment is established for a defined purpose".
The European Medicines Agency (EMA) (EMA, 2016) defines "reliability" as "a measure of the extent that a test method can be performed reproducibly over time when using the same protocol". The reliability of current pivotal in vivo toxicology studies should be assured by compliance with GLP. Even though GLP principles are general and applicable in many areas, it has become evident that GLP, which was mainly tailored for in vivo methods available at the time of its development, needs to be adapted to the requirements of in vitro assays (OECD, 2004). More recently, the EU Joint Research Center (JRC), at the request of OECD, developed a guidance on Good In Vitro Method Practices -GIVI-MP (OECD, 2018). The GIVIMP document describes the factors relevant to reliability and relevance of in vitro data generated for human safety assessment purposes and has been written with different users in mind, including GLP test facilities and research laboratories developing new in vitro methods.
"Relevance" is defined by EMA (2016) as "the extent to which the test correctly measures or predicts the biological effect of interest". So, how should relevance be established? According to EMA, "Relevance incorporates consideration of the accuracy (e.g. concordance with comparable validated test method with established performance standards) of a test method". However, this assumes that the existing "validated test method" is adequately relevant, which is not always the case for some in vivo animal test systems. A direct example of this is AOP-based in vitro skin sensitization testing, where concordance of the in vitro results with the standard in vivo mouse local lymph node assay (LLNA) is poor (Dumont et al., 2016) due to both limited relevance and high variability of the reference data. In this case, concordance of the in vitro data with human data is much better (concordance of the in vivo LLNA with human data is relatively poor) (Natsch and Emter, 2015) as was already shown for the guinea pig assay preceding the LLNA (Luechtefeld et al., 2016;Adriaens et al., 2014;Hoffmann, 2015). In the meantime, several testing and assessment strategies have been published (Urbisch et al., 2015;Roberts and Patlewicz, 2018) that have contributed to the development of a consolidated integrative approach to testing and assessment (IATA) for skin sensitization by OECD (2014).
For the pharmaceutical sector, EMA and FDA have put in place a qualification process that addresses innovative drug development methods and tools developed for a specific intended use in a pharmaceuticals research and development context (non-clinical or clinical studies) (EMA, 2014). These are voluntary, scientific pathways leading to a regulatory conclusion that an innova- standing of adverse drug effects and permit translational, exposure-based modeling of toxicological impact, − Education based on a new integrated teaching strategy. The participants of the workshop envisage an increasing importance of investigative toxicology in supporting the entire drug development process. Cross-industry collaboration in the pre-competitive part of this work is helping to accumulate experiences on novel approaches faster and more reliably. This process can impact on the complementation and replacement of traditional methods in regulatory toxicology. This also means that the promotion of new technologies supports a lesser reliance on animal studies, moving the industry to more human-relevant assessment approaches.

References
Adriaens, E., Barroso, J., Eskes, C. et al. (2014) 1093/hmg/ddy187 Atienzar, F. A., Blomme, E. A., Chen, M. et al. (2016). Key In a prospective manner, efforts in investigative toxicology focus on early screening tools allowing candidate selection as well as de-risking activities, where tailored work packages aim to unravel mechanisms of adverse events observed during pre-clinical and clinical stages and assessment of potential ways forward. The key to success of these activities is close interaction with experts from academia and regulatory bodies. The objective is to elaborate robust, reliable, and accepted investigative toxicology concepts for decision making by virtue of multiple and diverse tools, technologies, and readouts.
In the rapidly evolving field of drug development, with new drug modalities arising, new chemical spaces being explored, complex pharmacological strategies being pursued, and diseases and pathways involving the immune system becoming more prominent, it is instrumental that safety assessment is continuously innovated to be able to address new challenges arising from these trends. Together with high regulatory burden, cost and time pressures, and the need to reduce animal testing demands for novel concepts, strategies, and tools, investigative toxicology plays a key role. Collaborative efforts in the pre-competitive space are considered necessary to explore and establish these innovative new tools and concepts.
An important role is played by education to create a new generation of scientists (Daneshian et al., 2011). Novel technologies, innovative scientific approaches, and new terminologies demand modern and up-dated ways of teaching-learning (Flecha, 2018). Professional knowledge and expertise in this discipline is mandatory, considering that the possible target audience is wide and includes both undergraduate students and graduates (M.Sc., Ph.D., post-docs), as wells as academic and industry scientists, and regulators. Teaching must go hand in hand with research, and common strategies and novel approaches are urgently required. Currently, there are few university graduate and post-graduate level courses in life science that provide adequate training although their implementation is fundamental for a competency-based education and, consequently, social impact.
Among the key themes are: − Making use of clinical and real-world data to inform and improve testing algorithms early on, establishing modeling approaches to predict the human response from in vitro data, − Use of new human cell models, such as microphysiological systems and human organs-on-chips, for in vitro safety and efficacy assessment and generation of a cellular therapeutic index, modeling disease aspects by use of patient-derived tissues and iPSCs for patient stratification, and performing phenotypic screens and identification of translational biomarkers, − Use of image-based technologies and omics readouts to support in vitro to in vivo assessment as well as back translation from clinical specimens to cell models to understand safety and disease as well as patient-specific, personalized aspects; monitoring AOPs and identifying key events from early in vitro discovery stages up to clinical phases, − Focusing on quantitative systems biology level-based risk assessments that integrate diverse pharmacological and toxicological data sets, including data from advanced humanized models and human tissue, to underpin mechanistic under-