Template for the Description of Cell-Based Toxicological Test Methods to Allow Evaluation and Regulatory Use of the Data

Only few cell-based test methods are described by Organisation for Economic Co-operation and Development describe following: Which toxicological target test system readout(s) biological process(es) neurite outgrowth, differentiation) and/or toxicological events (e.g. oxidative stress, cell death) are modelled/reflected by your test method? (human) adverse outcome(s) is your test method or could be test test

continuous malformation index.Any combination of these independent elements results in a different overall test method.

Compliance issues with OECD Guidance Document 211 (GD211)
The key literature on how to describe non-guideline methods is OECD Guidance Document 211 (GD211) (OECD, 2017).In line with reports by many others (Freedman et al., 2015;Vogt et al., 2016;Hair et al., 2019), our own survey of the scientific literature has indicated that method descriptions still show an enormous heterogeneity in quality, detail and scope.Moreover, the clarity of the information, as well as the format that is used to provide the information, varies widely.This makes the extraction of necessary information as well as the use and interpretation of data produced using the method difficult.We further found that the items required to be addressed in GD211 are understood and interpreted in different ways by the users, or sometimes not understood at all.
Part of this might be attributable to the fact that GD211 looks at non-guideline methods from a regulatory perspective, i.e., it explicitly provides a format for reporting non-guideline methodology so that it can be used by regulatory toxicologists for the safety assessment of chemicals.It asks for information that is of specific value for regulators (e.g., information on validation, predictivity, standardization, applicability domain, etc.), but does not provide detailed background information on why regulators need such information and what they use it for.Experience has shown that the target audience, which includes method developers in academia or fundamental research, is not necessarily familiar with the underlying regulatory background.
Another reason for apparent non-compliance is that GD211 is a brief and highly condensed document.It often covers several distinct features of tests in a single question.Test developers may not consider all the different aspects of such a complex question without more specific guidance, and thus some issues may be missed entirely.
From the information recipient's point-of-view, non-complicance is not the only problem.Also, finding and retrieving the information within a report prepared in a not fully-standardized format can be a difficult task.For instance, it may be a time-consuming task to find out whether some information is absent or whether it is only placed or mentioned in another context than usual.

Main test elements
Test, assay, test system, test method… All these terms are found in the literature and in discussions, but they need some definition to allow for specification of their background and requirements."Test" is the shortest term and thus a good place to start: a toxicological "test" is a procedure to determine, in a quantifiable manner (with respect to damage and dose/concentration), whether a substance may harm/incapacitate an organism, a cell, or an essential component thereof.The terms "assay" and "test method" are used interchangeably with "test".
A test has various elements that are independent of one another to a large degree (Schmidt et al., 2017).The five main elements are: the test purpose (see Section 4), the test system, the exposure scheme, the endpoint, and the prediction model.Thus, the "test system" is one element of the "test method" and must not be confused with it.As this often causes confusion among non-specialists, it deserves some further explanation.The in vivo test method for acute toxicity assessment is a good example to explain the problem.The overall test method is defined by a test guideline, e.g., OECD TG 423 (acute oral toxicity) (OECD, 2002).The test system is, e.g., mouse, fasted for > 3 h prior to dosing; the exposure scheme involves single dosing by gavage and continued observation for 14 days; the endpoint is the number/percentage of dead animals; and the prediction model converts the test data into toxicity classes defined by the United Nation's Globally Harmonized System (GHS), e.g., category 2 (comprising compounds with an LD50 in the range of 5-50 mg/kg bodyweight).Clearly, the mouse is the test system and not the test method.Each element of the test method may be modified independently, e.g., changing from oral to dermal exposure; using time to death as endpoint; employing a binary prediction model (toxic/non-toxic) and using rats instead of mice.Any such change results in a new, different test method that may no longer be in accordance with the guideline method.
The same principles apply to cell-based assays, i.e., in vitro methods or new approach methods (NAM).For example, the test system may be neurons in 2D culture or neural organoids or liver cells; the exposure scheme may be 24 h exposure of cells cultured in standard medium plus test chemical or 72 h exposure in a special medium, with or without re-addition of test chemical every 24 h; the prediction model may be binary (toxic/non-toxic) or it may use a composite measure of several endpoints to define a for the purpose of safety assessment.As GD211 is targeted mainly at regulators, it leaves scientists less familiar with regulation uncertain as to what level of detail is required and how individual questions should be answered.Moreover, little attention was given to the description of the test system (i.e., cell culture) and the steps leading to it being established in the guidance.To address these issues, an annotated toxicity test method template (ToxTemp) was developed (i) to fulfill all requirements of GD211, (ii) to guide the user concerning the types of answers and detail of information required, (iii) to include acceptance criteria for test elements, and (iv) to define the cells sufficiently and transparently.The fully annotated ToxTemp is provided here, together with reference to a database containing exemplary descriptions of more than 20 cell-based tests.
veloped for regulatory purposes.In this domain, the element "test purpose" plays a special role: Beyond the primary test purpose (e.g., determination of cytotoxicity), the results of a test may also be used for a secondary, regulatory purpose (regulation), and even a tertiary purpose (e.g., modelling a potential hazard in the population).Consider, e.g., in vitro tests used to predict Globally Harmonized System (GHS) classifications (e.g., moderate or strong skin sensitizer or eye irritant) (secondary purpose), which in turn are used to model the potential chemical hazard to which workers or the general population may be exposed (tertiary purpose).
Test developers might not be familiar with the limitations and requirements demanded of their test when it is to be used for secondary (or tertiary) purposes, and this may in the end lead to misinterpretation of results obtained from a test method used for purposes that it was not originally designed for.The problem is further complicated by the fact that an apparently simple regulatory statement (e.g., "The substance is a skin sensitizer.") in reality represents the outcome of a highly complex decision system based on a regulatory framework that has evolved over several decades, with paradigms and implicit assumptions that are often not obvious to the outsider.
To avoid such problems, communication between developers and regulatory recipients needs to be as transparent, comprehensive and precise as possible.Description of a test method, especially of the test purpose, in a way that allows such communication is essential to achieve this.

Distinctions between a test method description and the overall testing process
Reproducible toxicological research necessitates the comprehensive description of the testing process.Good information on this can be found in the series of guidance on Good Laboratory Practice (GLP) (e.g., OECD, 2005) or in the OECD Guidance Document on Good In Vitro Methods Practice (GIVIMP) (OECD, 2018).Several helpful tools are available, e.g., from the Science in Risk Assessment and Policy (SciRAP3 ) web resource, the DB-ALM methods summary (adapted from GD211), the EURL-ECVAM test submission template (used for structuring information for test validation), or the ALTEX BenchMarks series (Kisitu et al., 2019;Krebs et al., 2018).
Several earlier EU-funded projects devoted considerable resources to harmonize test method descriptions (Kinsner-Ovaskainen et al., 2009;Rovida et al., 2014), and similar activities are taking place in the USA (Flood et al., 2017).The overall reporting of in vitro experiments (data and methods) has been addressed by an NC3Rs initiative (RIVER) (Prior et al., 2019), large stakeholder workshops organized by CAAT-Europe (Hartung et al., 2019), and on the regulatory level (OECD harmonized templates for data reporting (OHT); OHT 201-Intermediate effects 4 is especially relevant).

Annotations and guiding questions for a test method template
The problems that test developers face when trying to comply with the questions raised in GD211 may best be illustrated using an analogy from an entirely different field.Let us assume that a police recruit is asked in a questionnaire to "indicate body measures".Some would answer by giving only their height and weight.Others may include their shoe size and special measures for jackets and trousers.Few would give their head circumference (required for the uniform hat) and/or their glove size.Possibly even fewer would think to provide information on their sight, i.e., data essential for ordering sunglasses (e.g., whether they are myopic, distance between the eyes, etc.).
In the course of the European research project EU-ToxRisk (Daneshian et al., 2016), compliance of project case study results with GD211 became an important issue.While trying to implement high-quality assay documentation within the project, we realized that hardly any of the senior scientists and none of the junior faculty fully understood the requirements of GD211.
How are situations as exemplified above (police recruit or EU-ToxRisk examples) best avoided?How can it be ensured that all required information (i) is reported, (ii) is presented in a structured way so that the recipients can verify its completeness, and (iii) can be found easily?Two major strategies towards this goal are (a) to sub-divide complex questions into sets of more simple questions, each dealing with a single, defined issue, and (b) to explain the questions by adding additional notes, comments and guiding questions.These measures help to structure the answer and ensure that all relevant aspects are considered.
Such guidance is provided by the test method template (Tox-Temp) shown in a compact version as Box 1 and provided as a printable version including additional notes and examples in the supplementary file1 .Tab. 1 gives a synoptic overview of all items/ questions of GD211 and of the respective counterparts in the Tox-Temp (see also the supplementary file 1 for the detailed comparison).Moreover, an online methods database, based on the Tox-Temp, is under construction2 .Using Tab. 1 should enable anyone accustomed to the structure of GD211 to retrieve all relevant test method information from the database.

Understanding the test purpose
Any test (toxicological or not) is developed to probe a test hypothesis (e.g., whether a substance is toxic or not).The test design will always reflect that purpose and test parameters will ideally be optimized in order to achieve maximum certainty about whether the hypothesis should be accepted or rejected.
It is a basic scientific principle that test results should -within limits -only be used for the purpose they were designed for.This is not trivial for in vitro or new approach methods (NAM) de-Box.1: Documentation of a test method and its readiness status, guided by a test method questionnaire A print version of the complete test method questionnaire including notes and examples can be found in the supplementary file 1 .

Descriptive full-text title
Provide a descriptive title using normal language without technical terms or acronyms.

Abstract
Please describe in no more than 200 words the following: Which toxicological target (organ, tissue, physiological/biochemical function, etc.)If the method has undergone some form of validation/evaluation, give its status.(9.4)

Name of test method
Provide the original/published name, as well as the potential tradename.

Assigned data base name
Normal text names often do not uniquely define the method.Therefore, each method should be assigned a clearly and uniquely defined data base name.These are some example data base names generated in the EU-ToxRisk project: UKN1a_DART_NPC_Diff_6D_02 UKN1b_DART_NPC_Diff_4D_01 UKN2a_DART_NC_Migr_24h_04 The name is assembled (in more generic terms) from the following elements: Axa_B_C_D_E Axa: mandatory part of the identifier allowing unambiguous identification A: Abbreviation/acronym of the partner depositing the assay x: Consecutive number (referring to the partner's assay number) a: Sub-specifier (for variants, i.e. very similar assays but e.g.different readout or medium); not mandatory, but 'Axa' must be specific (i.e.clearly identifying) for each assay variant.B: Indication of the main intended use (max.5 letters), e.g.DART, Neuro, Liver, Lung, Renal, Redox, Stress..

Name and acronym of the test depositor
Include affiliation.

Name and email of contact person
Provide the details of the principal contact person.

Name of further persons involved
For example, the principal investigator (PI) of the lab, the person who conducted the experiments, etc.

Reference to additional files of relevance
Supply number of supporting files.Describe supporting files (e.g.metadata files, instrument settings, calculation template, raw data file, etc.).

Supply of source cells
Describe briefly whether the cells are from a commercial supplier, continuously generated by cell culture, or obtained by isolation from human/animal tissue (or other).

Overview of cell source component(s)
Give a brief overview of your biological source system, i.e. the source or starting cells that you use.Which cell type(s) are used or obtained (e.g.monoculture/co-culture, differentiation state, 2D/3D, etc.)?If relevant, give human donor specifications (e.g.sex, age, pool of 10 donors, from healthy tissue, etc.).

Characterization and definition of source cells
List quantitative and semi-quantitative features that define your cell source/starting cell population.For test methods that are based on differentiation, describe your initial cells, e.g.iPSC, proliferating SH-SY5Y; the differentiated cells are described in section 4. Define cell identity, e.g. by STR signature (where available), karyotype information, sex (where available and relevant), ATCC number, passage number, source (supplier), sub-line (where relevant), source of primary material, purity of the cells, etc. Describe defining biological features you have measured or that are FIRMLY established (use simple listing, limit to max.0.5 pages), e.g. the cells express specific marker genes, have specific surface antigens, lack certain markers, have or lack a relevant metabolic or transporting capacity, have a doubling time of x hours, etc. Transgenic cell lines have particular requirements concerning the characterization of the genetic manipulation (type of transgene, type of vector, integration/deletion site(s), stability, etc.).Organoids and microphysiological systems (MPS) may need some special/additional considerations as detailed in Pamies et al. (2018) and Marx et al. (2016), e.g.ratio of cell types used, percent of normal cells in tumor spheroids created from resected tissue; derivation of cells for re-aggregating brain cultures.

Acceptance criteria for source cell population
Describe the acceptance criteria (AC) for your initial cells (i.e. the quality criteria for your proliferating cell line, tissue for isolation, organism, etc.).Which specifications do you consider to describe the material, which quality control criteria have to be fulfilled (e.g.pathogen-free)?Which functional parameters (e.g.certain biological responses to reference substances) are important?For iPSC maintenance: How do you control pluripotency?How stable are your cells over several passages?Which passage(s) are valid?For primary cells: Show stability and identity of supply; demonstrate stability of function (e.g.xenobiotic metabolism).Quantitative definitions for AC should be given based on this defining information.Exclusion criteria (features to be absent) are also important.As in 3.3., special/additional requirements apply to genetically-modified cells and microphysiological systems.

Variability and troubleshooting of source cells Name known causes of variability of the initial cells/source cells. Indicate critical consumables or batch effects (e.g. relevance of the plate format and supplier, batch effects of fetal calf serum (FCS) or serum replacement, critical additives like type of trypsin, apo-transferrin vs. holo-transferrin, etc.
). Indicate critical handling steps and influencing factors (e.g.special care needed in pipetting, steps that need to be performed quickly, cell density, washing procedures, etc.).As in 3.3., special/additional requirements apply to genetically-modified cells and microphysiological systems, e.g.dependence on matrix chemistry and geometry, dependence on microfluidics system, consideration of surface cells vs core cells, etc. Give recommendations to increase/ensure reproducibility and performance.

Differentiation towards the final test system
Describe the principles of the selected differentiation protocol, including a scheme and graphical overview, indicating all phases, media, substrates, manipulation steps (medium change/re-plating, medium additives, etc.).Special/additional requirements apply to microphysiological systems and organoids: e.g.cell printing, self-aggregation/self-organisation process, interaction with the matrix, geometrical characterization (size/shape), etc.

Reference/link to maintenance culture protocol
Provide the SOP of the general maintenance procedure as a database link.This should also include the following information: How are the cells maintained outside the experiment (basic cell propagation)?How pure is the cell population (average, e.g.95% of iPSC cells Oct4-positive)?What are the quality control measures and acceptance criteria for each cell batch?Which number(s) passage(s) can be used in the test?Is Good Cell Culture Practice (GCCP) and/or Good In Vitro Method Practice (GIVIMP) followed?How long can same cell batches be used?How are frozen stocks and cell banks prepared?For primary cells: how are they obtained in general and what are they characterized for (and what are inclusion and exclusion criteria).

Definition of the test system as used in the method
4.1 Principles of the culture protocol Describe the test system as it is used in the test.If the generation of the test system involves differentiation steps or complex technical manipulation (e.g.formation of microtissues), this is described in 3.6.Give details on the general features/principles of the culture protocol (collagen embedding, 3D structuring, addition of mitotic inhibitors, addition of particular hormones/growth factors, etc.) of the cells that are used for the test.

What is the percentage of contaminating cells; in co-cultures what is the percentage of each subpopulation?
Are there subpopulations that are generally more sensitive to cytotoxicity than others, and could this influence viability measures?Is it known whether specific chemicals/chemical classes show differential cytotoxicity for the cell sub-populations used?4.2 Acceptance criteria for assessing the test system at its start What are the endpoint(s) that you use to control that your culture(s) is/are as expected at the start of toxicity testing (e.g.gene expression, staining, morphology, responses to reference chemicals, etc.)?Describe the acceptance criteria for your test system, i.e. the quality criteria for your cells/tissues/organoids: Which endpoints do you consider to describe the cells or other source material, which parameters are important?Describe the (analytical) methods that you use to evaluate your culture (PCR, ATP measurement) and to measure the acceptance criteria (AC).Which values (e.g.degree of differentiation or cell density) need to be reached/should not be reached?Historical controls: How does your test system perform with regard to the acceptance criteria, e.g. when differentiation is performed 10 times, what is the average and variation of the values for the acceptance criteria parameters)?Indicate actions if the AC are not met.Examples: cell are > 90% viable, or > 98% of cells express marker x (e.g.AP-2), or > 80% of the cells attach, etc.

Variability of the test system and troubleshooting
Give known causes of variability for final test system state.Indicate critical consumables or batch effects (e.g.plate format and supplier, batch effects of FCS or serum replacement, additives).Indicate critical handling steps, and/or influencing factors identified (e.g.special care needed in pipetting, steps that need to be performed quickly, cell density).Indicate positive and negative controls and their expected values, and accepted deviation within and between the test repeats.Give recommendations to increase/ensure reproducibility and performance.

Metabolic capacity of the test system
What is known about endogenous metabolic capacity (CYP system (phase I); relevant conjugation reactions (phase II))?What is known about other pathways relevant to xenobiotic metabolism?What specific information is there on transporter activity?

Omics characterization of the test system
Are there transcriptomics data or other omics data available that describe the test system (characterization of cells without compounds)?Briefly list and describe such data.Indicate the type of data available (e.g.RNASeq or proteomics data).Refer to data file, data base or publication.

Features of the test system that reflect the in vivo tissue
Give information on where the test system differs from the mimicked human tissue and which gaps of analogy need to be considered.

Commercial and intellectual property rights aspects of cells
Are there elements of the test system that are protected by patents or any other means?4.9 Reference/link to the culture protocol Fill only if section 3 has not been answered.Provide the SOP for the general maintenance procedure as a database link.This should also include the following information: How are the cells maintained outside the experiment (basic cell propagation)?How pure is the cell population (average, e.g.95% of iPSC cells Oct4-positive)?What are the quality control measures and acceptance criteria for each cell batch?Which number(s) passage(s) can be used in the test?Is Good Cell Culture Practice (GCCP) and/or Good In Vitro Method Practice (GIVIMP) followed?How long can same cell batches be used?How are freezing stocks and cell banks prepared?For primary cells: How are they obtained in general and what are they characterized for (and what are inclusion and exclusion criteria).

Exposure scheme for toxicity testing
Provide an exposure scheme (graphically, show timelines, addition of medium supplements and compounds, sampling, etc.), within the context of the overall cell culture scheme (e.g.freshly re-plated cells or confluent cells at start, certain coatings, etc.).
Include medium changes, cell re-plating, whether compounds are re-added in cases of medium change, critical medium supplements, etc.

Endpoint(s) of the test method
Define the specific endpoint(s) of the test system that you use for toxicity testing (e.g.cytotoxicity, cell migration, etc.).Indicate whether cytotoxicity is the primary endpoint.What are secondary/further endpoints?Also describe here potential reference/normalization endpoints (e.g.cytotoxicity, protein content, housekeeping gene expression) that are used for normalization of the primary endpoint.

Overview of analytical method(s) to assess test endpoint(s)
Define and describe the principle(s) of the analytical methods used.Provide here a general overview of the method's key steps (e.g. cells are fixed or not, homogenized sample or not, etc.), sufficient for reviewers/regulators to understand what was done, but not in all detail for direct repetition.If you have two or more endpoints (e.g.viability and neurite outgrowth), do you measure both in the same well, under same conditions in parallel, or independently of each other?For imaging endpoints: Explain in general how quantification algorithm or how semi-quantitative estimates are obtained and how many cells are imaged (roughly).

Technical details (of e.g. endpoint measurements)
Provide information on machine settings, analytical standards, data processing and normalization procedures.For imaging endpoints: provide detailed algorithm.This information should also be covered in an SOP, preferably in DB-ALM format (see link in 6.6).

Positive controls
What chemicals/manipulations are used as positive controls?Describe the expected data on such controls (signal and its uncertainty)?How good are in vivo reference data on the positive controls?Are in vivo relevant threshold concentrations known?

Negative and unspecific controls
What chemicals/manipulations are used as negative controls?Describe the expected data on such controls (signal and its uncertainty)?(Such data define the background noise of the test method) What is the rationale for the concentration setting of negative controls?Do you use unspecific controls?If yes, indicate the compounds and the respective rationale for their use and the concentration selection.

Features relevant for cytotoxicity testing Does the test system have a particular apoptosis sensitivity or resistance? Is cytotoxicity hard to capture for minor cellular subpopulations?
In multicellular systems, which cell population is the most sensitive?Are specific markers known for each cell population?Are there issues with distinguishing slowed proliferation from cell death?For repeated/prolonged dosing: Is early death and compensatory growth considered?For very short-term endpoints (e.g.electrophysiology measured 30 min after toxicant exposure): Is a delayed measure of cytotoxicity provided?

Acceptance criteria for the test method Which rule do you apply to test whether a test run is within the normal performance frame? How do you document this decision?
Indicate actions if the AC are not met.

Throughput estimate
Indicate "real data points per month" (not per week/per quarter, etc.):count three working weeks per month.Each concentration is a data point.Necessary controls that are required for calibration and for acceptability criteria are NOT counted as data points.All technical replicates of one condition are counted as one single data point (see notes for explanation) Indicate possibility/extent of repeated measures (over time) from same dish.Explain your estimate.

Preparation/addition of test compounds
Give an overview of the range of volumes, particular lab ware/instruments for dispensing, temperature/lighting considerations, particular media/buffers for dilution, decision rules for the solvent, tests of solubility as stocks and in culture medium, etc.How are compound stocks prepared (fold concentration, verification, storage, etc.)?

Why data are meaningless without a test method description
There is an intricate relationship between a test method and the data it generates.It is evidently clear that a test method itself has no value in toxicology if it does not generate data.The reverse condition is often overlooked: Is there any value in data if the test method is not sufficiently documented or disclosed?This question is not easily answered.One reason for this is that "naked data", i.e., data without reference to a test method, do not exist for in vivo tests.Most data inherently contain some information on test conditions.For instance, if one talks about data on the LD50 (lethal dose, 50%) test for oral toxicity in rat (example data: 4.5 ±1 mg/kg bodyweight), then the data contain information on The many types of essential information can be grouped as belonging to four partially overlapping packages: (i) the overall test method description, (ii) the technical test procedure (as outlined in a standard operating procedure (SOP) involving, e.g., defined labware, consumables and pipetting steps), (iii) the characterization of test and reference materials/chemicals, and (iv) all issues relating to data processing and archiving.In a wider sense, an additional package (v) addresses the test purpose, the test limitations (i.e., information on its applicability) and the criteria to be used for interpreting test results.Here, the focus is on the overall test method description (packages i + v).Notably, this description also involves some information from the other packages (ii-iv) -not in full detail but as far as required to provide an overall understanding of the test method.
(GCCP) are considered and adhered to (Coecke et al., 2005).Yet, many problems have been described in this regard.For instance, many cells have been misidentified or cross-contaminated with other cells (Drexler et al., 2003;Gignac et al., 1993;Horbach and Halffman, 2017;Masters, 2002;Nardone, 2007Nardone, , 2008;;Stacey, 2000;Stacey et al., 1992), or microbiologically contaminated (e.g., with mycoplasms).Even two batches of cells from a single source can differ significantly (Ben-David et al., 2018;Frattini et al., 2015;Kleensang et al., 2016;Liu et al., 2019).Even more importantly, this form of identification only applies to cell cultures that are based on well-defined cell lines.Even in apparently simple cases (e.g., using the HepG2 cell line or primary rodent hepatocytes or neuronal cells), cultures of the same cells have different properties depending on their culture medium, cell density, cell-matrix, etc. (Brigelius-Flohe et al., 1995;Latta et al., 2000;Leist et al., 1999;Ramaiahgari et al., 2014;Delp et al., 2019;Falsig et al., 2006;Gantner et al., 1996;Gerhardt et al., 2001;Zimmer et al., 2012).2. Determination of the cells' genotype.Sometimes, genotyping, e.g., by short tandem repeat (STR) profiling, by extensive single nucleotide polymorphism (SNP) profiling or by array comparative genome hybridization (aCGH) (Zhang et al., 2017;Matsuda, 2017;Cao et al., 2015) is used for defining a test system.However, the genotype gives no information on the differentiation state or the epigenetic state.This problem is also not circumvented by complete genome sequencing (Gutbier et al., 2018).3. Reference to the original source.For primary cells, this has in the past been considered as sufficiently defining (e.g., for hepatocytes, neurons, broncho-alveolar cells or blood cell populations) (Kruglikov et al., 1976;Schildknecht et al., 2011Schildknecht et al., , 2013;;Gerhardt et al., 2001).However, it is now ackowldeged that this approach does not account for the large variability in cell purity, viability, activation/inflammatory state, de-differentiation, history within the organism (e.g., drug treatment), etc.Even the culture setup can make a dramatic difference.This is exemplified by broncho-alveolar cells grown submerged in medium (standard type of culture) or at the air-liquid interphase, which promotes baso-lateral polarization.Even more variation is introduced if one considers that the cells may be confluent (with tight junctions) or non-confluent, at an early or late passage number, etc. 4. Reference to a differentiation protocol.Many modern test systems are derived from stem cells that have been differentiated.If they are obtained from commercial sources, the protocol used to generate the cells is mostly confidential, and variability between lots is unknown.Even if all protocol details are known, the issue still remains that one given cell (with a defined genome) can give rise to dozens of differentiated progeny cell types and that those may have multiple health and proliferation states (Stiegler et al., 2011;Zimmer et al., 2011).The differentiation protocol is not sufficiently defining for the final population.Is it at all possible to unambiguously define a cell culture?At the present state of the art, this may indeed not be possible (Gutbier et al., 2018;Liu et al., 2019) unless the use of the culture is specified.With a specific use scenario in mind (use of the cells in a the test system (rat) and the exposure scheme (oral dosing, timing).Also, information on the endpoint and a normalization of the dose to a test unit can be easily derived from the data.In the above example, this would be death (as endpoint) and dose per body weight unit (as data normalization procedure).Without such inherent test method information, a data point (e.g., 5 mg or 117 g) would be meaningless.
The situation is even more extreme for cell-based tests or new approach methods (NAM) in general.Data may be something like "10%".Depending on the test, this may refer to a 10% reduction in viability (i.e., hardly any effect), 10% remaining viability (a drastic effect), a 10% increase in neurite growth (a moderate effect of unclear relevance) or 10% remaining level of acetylcholine activity (extreme adversity).Even if this was specified, it would not be possible to interpret the data, as they may have been obtained after exposure for 5 min to 10 nM compound or after exposure for one year to 1 mM compound.This example, trivial as it may appear, is intended to explain an important point: Data have no meaning if the respective test method with all its elements is not disclosed fully and transparently (Leist and Hengstler, 2018).
Large studies (mainly focusing on animal experimentation) have reported that test method descriptions are often poor (Freedman et al., 2015;Hair et al., 2019;Vogt et al., 2016;Ingre-Khans et al., 2019).This lack of test method transparency may contribute significantly to reproducibility issues often reported for biomedical research (Ioannidis, 2012;Begley and Ellis, 2012;Prinz et al., 2011;Hartung et al., 2019).In the toxicological context, data obtained using test method descriptions that are not sufficiently clear cannot be considered "valid", i.e., "suitable for a regulatory context or in a context where major scientific or commercial decisions depend on the data interpretation" (Hartung and Leist, 2008;Leist et al., 2014).Given the experience that many essential details on methods are often not disclosed in scientific publications, it is of prime importance to define meticulously what is required of a method description and also to provide guidance on structuring this information (OECD, 2018;Leist et al., 2010Leist et al., , 2012;;Hartung et al., 2019).

How to define a cell
The test system of many in vitro test methods is a cell culture (in two-or three-dimensional format).Therefore, the definition of the cells is an important element of a test method description.This is also clearly stated in GD211.However, the problem of defining a cell appears to have been underestimated -especially with respect to many modern, highly dynamic and complex cell cultures (Pamies et al., 2017(Pamies et al., , 2018;;Balmer et al., 2012;Bal-Price et al., 2015).To appreciate the problem, it is helpful to initially look at some attempts to define a cell and to examine the problems associated with them: 1. Quoting the catalogue number or another apparent cell identifier.Often the ATCC (American Type Culture Collection) number, the colleague who provided the cells, or the tissue from which the (primary) cells were obtained is given.This may work to some extent if all principles of Good Cell Culture Practice Tab. 1: Synopsis of information found in OECD GD211 and the extensive test method description questionnaire (ToxTemp) compiled here The left column lists all chapters/items of GD211 (original numbering) (OECD, 2017), while the right column indicates where the corresponding information can be found in ToxTemp.Also, GD211 sometimes lists several important issues without assignment to sub-items (see, e.g., item 3 "Data interpretation and prediction model" of GD211).Such compilations of several aspects are often covered by several distinct chapters of ToxTemp.Two examples for the diverging levels of detail are highlighted in blue: (i) chapter 2.3 of GD211 addresses the test system (cells).This information is covered in 15 sub-chapters of ToxTemp; (ii) in chapter 2.6 of GD211, there are several sub-items (bullets) that are highly divers.These are therefore covered in ToxTemp in very different chapters.A more detailed comparison is found in Tab.S2 1 .

Acceptance criteria (AC)
The ToxTemp places a strong emphasis on acceptance criteria, not only for the overall test method but also for the test system at different stages.Going one step further, the questions address specification of the test system at the start and at the end of testing.This is necessary in many cases when the cell culture changes significantly during the test (chemical exposure phase).Examples for this are the use of differentiating stem cells (shift of subpopulations) or of primary cells (shift of composition, differentiation state, viability, activation state and other features).
The definition and application of acceptance criteria (AC) ensures that the specifications of test method elements are within a given test method), a fit-for-purpose definition is feasible and realistic (Lorge et al., 2016).
The ToxTemp includes two sections that help to better define cells.First, a cell culture is not seen as one static and given system.Therefore, it contains a whole suite of questions, notes and guiding questions that address different stages of the cells and are designed to define the entire process leading from a maintenance culture (or an original tissue) to the final test system (Fig. 1).Second, the definition of acceptance criteria (AC) for the test system is explained and requested.This ensures that the state of the cell culture can be defined in a pre-determined range that is fit-for-purpose concerning the use of the test system within the given test method (Blaauboer et al., 2012;Hansson et al., 2000;Hirt et al., 2000).The ToxTemp presented here assesses test system features and the respecitive acceptance criteria (AC) at different stages.In chapter 3, the original source of cells and their characteristics (green cells) as well as their differentiation/maturation (yellow cells) are covered.Chapter 4 focuses on the test system stages at the beginning of the test, i.e., at the start of chemical exposure, and at the final stage (end of test; red cells).Four examples of cellular test systems and their stages during test preparation and testing are given.In addition, exemplary processes (e.g., proliferation) that can change a test system, which should therefore be documented and taken into account for a comprehensive test method documentation, are indicated.
readiness is defined more broadly, allowing for different readiness stages, depending on the purpose/application of the test method (e.g., screening versus regulatory use) (Bal-Price et al., 2018;Fritsche et al., 2017).Within such a framework, different elements of a test method may be evaluated separately, e.g., in a modular validation process (Hartung et al., 2004), or according to specified readiness criteria and scoring lists (Bal-Price et al., 2018).The ToxTemp gives guidance on how to provide such detailed information.

Conclusions
Interactions with many researchers from academia and industry have shown that the most common roadblock to practical implementation of GD211 is a lack of understanding of the requirements, and that additional notes and guiding questions are necessary to overcome this.Moreover, it was recognized that a comprehensive test method description needs more focus on practical aspects of the testing process (such as details on data handling or of the test system) and on the documentation of method limitations and troubleshooting.We found that the project partners of the EU-ToxRisk project were all able, compliant and motivated to deliver comprehensive method descriptions when guided by the ToxTemp.This shows that there is a positive attitude and a good motivation amongst test developers to describe their assay.A tool/ template such as the one presented here is likely to act as support and catalyst towards a culture of fully transparent and complete test method descriptions.
In conclusion, the ToxTemp presented here will make it easier for the many test developers not deeply familiar with regulatory environments to describe their assays in sufficient detail.It will also allow to better understand and compare different testing methods (e.g., for the same endpoint).If the ToxTemp were broadly implemented, it would not only give guidance to test developers but could change the overall culture of method documentation.Scientists fostered in an environment that values sound descriptions of methods and data will not only become better (more reliable) researchers but will also be better trained for entering responsible positions in industry and regulatory agencies.
Arch Toxicol 89, 269-287. doi:10.14573/altex.1402121 Bal-Price, A., Hogberg, H. T., Crofton, K. M. et al. (2018).Recommendation on test readiness criteria for new approach methods in toxicology: Exemplified for developmental neurotoxicity.ALTEX 35, 306-352. doi:10.14573/altex.1712081 Balls, M., Amcoff, P., Bremer, S. et al. (2006).The principles of weight of evidence validation of test methods and fit-for-purpose state that allows the performance of the test under robust/reproducible conditions.The criteria and methods to control acceptability, including information on historic control data, are an indispensable information requirement for a test that may be used in a regulatory context and/or may need to be transferred from one laboratory to another.

Validation status
"Validation" generally describes the process of establishing that a test method is fit for the test purpose.In chemical risk assessment this often translates into demonstrating the adequacy, relevance and reliability of the test method for the purpose in question (e.g., OECD Manual for the Investigation of High Production Volume Chemicals, chapter 35 ; Hartung et al., 2004;Balls et al., 2006;Hartung, 2007;Hoffmann and Hartung, 2006;Coecke et al., 2014Coecke et al., , 2016)).
As described above, an in vitro test method often has a primary purpose, i.e., to investigate the biological activity of a chemical with respect to a cell-based endpoint, but may also be used for secondary or further regulatory purposes.It is well-established that a test method for which principal fitness for the primary purpose has not been actively established should not be used for any regulatory purpose.However, a test that is fit for a primary purpose is not automatically fit for a secondary or other regulatory purpose.For instance, even if an assay has been found "fit" to reliably predict stress gene activation, and there is a hypothesis that such activation is responsible for a certain type of liver toxicity, and therefore the test method may be useful as an indirect way (a model) to predict this type of toxicity in humans, additional steps have to be taken to demonstrate the fitness of the test method for this secondary regulatory purpose, e.g., by comparison with the respective human data or with other, already established models.
While this general notion may be acceptable for most test method developers, the need for "formal validation" (by dedicated institutions, according to standardized and harmonized workflows) is discussed more controversially.As background for such discussions, it may be useful to rationalize that chemical risk assessment relies on a highly defined methodological framework that is harmonized internationally to the extent possible.In this system, mutual acceptance of a test method is achieved most readily if an internationally trusted, independent body (such as the OECD) has confirmed that it is fit-for-purpose.Therefore, the formal validation status is often considered an important element of test method descriptions, and such information is also requested by GD211.
Notably, non-guideline methods, the main subject of GD211, have usually not been formally validated.Many of them are also not pre-validated or in the progress of being validated.Thus, answers to the question on the validation status may be relatively information-poor, i.e., taking the form of "non-applicable", "unknown" or "no specific information / no validated state".More information on the status of a test method may be obtained if test

Fig. 1 :
Fig. 1: Documentation of cell culture stages relevant for their use in an in vitro test methodThe ToxTemp presented here assesses test system features and the respecitive acceptance criteria (AC) at different stages.In chapter 3, the original source of cells and their characteristics (green cells) as well as their differentiation/maturation (yellow cells) are covered.Chapter 4 focuses on the test system stages at the beginning of the test, i.e., at the start of chemical exposure, and at the final stage (end of test; red cells).Four examples of cellular test systems and their stages during test preparation and testing are given.In addition, exemplary processes (e.g., proliferation) that can change a test system, which should therefore be documented and taken into account for a comprehensive test method documentation, are indicated.

for the test system at the end of compound exposure
Describe the acceptance criteria for your test system, i.e. the quality criteria for your cells/tissues/organoids: Which endpoints do you consider to describe the cells or other source material, which parameters are important?Which values (e.g.degree of differentiation or cell density) need to be reached/should not be reached?Historical controls: How does your test system perform with regard to the acceptance criteria, e.g. when differentiation is performed 10 times, what is the average and variation of the values for the acceptance criteria parameters)?Indicate actions if the AC are not met.Examples: Usual neurite length is 50 ±15 µm; experiments with average neurite length below 25 µm in the negative controls (NC) are discarded.Usual nestin induction is 200 ±40 fold, experiments with inductions below 80-fold for NC are discarded.