Courage for Simplification and Imperfection in the 21st Century Assessment of “Endocrine Disruption”

In the wake of the early reports on reproductive failure and urogenital and physiological changes in animals (Guillette et al., 1994, 1995; Sumpter, 1995; Sumpter and Jobling, 1995; Guillette and Guillette, 1996; Sumpter, 1998a,b) and the presumed association of xenobiotic exposure with reduced semen quality in men (Sharpe and Skakkebaek, 1993) and the incidence of testicular dysgenesis syndrome (tDS) (Sharpe and Skakkebaek, 2008), an enormous effort was brought forth to understand these observations and, in conjunction, to regulate the potential exposure to these compounds (Gray et al., 2000; Gray et al., 2001; Gray et al., 2006; Gray et al., 2009; Scott et al., 2009; Ryan et al., 2010a; Ryan et al., 2010b). While the early observations certainly motivated toxicologists to better understand the scientific basis for the reported effects, a veritable public media hype evolved (Colborn et al., 1993; Colborn, 1994; Colborn, 1995) echoing endlessly and uncritically over the next 10 years and beyond among politicians, government, NGOs and academics. While the former resulted in the advancement of our current understanding of reproductive and developmental toxicology and the role of compounds with endocrine activity, the latter, while being instrumental in providing improved assessment and regulation of endocrine active compounds, resulted in a gross overstatement of the actual risks. Indeed, despite being well known to everyone that humans are not rodents, nor amphibians, reptilians, birds or fish, and that exposure scenarios of humans in many cases are qualitatively and quantitatively different from that of the other species, the reported effects of a given compound in a given species is taken almost as proof that this very effect could also take place in humans (vom Saal et al., 2007; Alonso-Magdalena et al., 2010). Some of the toxicologists involved in the endocrine active substances (EAS) field may have underestimated that evidence and mechanism based toxicology in the species at risk as the only pathway that could lead to a thorough risk evaluation and, thus, to proper communication of the real risk. Other toxicologists have recognized the power of the new tools available, such as in silico and in Summary “Endocrine disruption” is a public and political buzzword that has and is still receiving high media attention. Based on the latter, numerous tiered testing strategies have evolved that should ensure that humans will not run a health risk due to the voluntary or involuntary exposure to endocrine active compounds (EAS). An analysis of the currently available knowledge on EAS mediated endocrine disruption in humans demonstrates that there are very few EAS that causally induce endocrine disruptive effects. Conversely, the association EAS exposure with increased risk or incidences of endocrine disruptive effects in humans are difficult to reconcile with the results from animal studies. Consequently, the analysis of the traditional and historically grown tiered approach in EAS testing, often at very high doses or concentrations, demonstrates that the likelihood of detecting EAS with true potential for endocrine disruption in humans is very low, primarily due to inherent differences between the surrogate species and the human, and will provide for a high number of false-positives commensurate with low efficiency, high cost, and often violently disputed interpretations of what the data would mean for human risk assessment. It is thus proposed that EAS testing for putative endocrine disruption in humans and qualitative and quantitative evaluation for risk assessment purposes should be entirely focused on human data, and derived from a combination of in silico and in vitro systems, PBPK modeling, metabonomic or genomic profiling of human tissue, realistic human EAS exposure, dose-effect principles and adverse effect scenarios, human patient or exposure cohort datasets, etc. Animals models should be used only where specific pathways in endocrine physiology and thus development and reproduction is nearly identical to the situation in the human, thereby guaranteeing that causal exposure and effect relationships in the animals can be extrapolated to the human


Introduction
In the wake of the early reports on reproductive failure and urogenital and physiological changes in animals (Guillette et al., 1994(Guillette et al., , 1995Sumpter, 1995;Sumpter and Jobling, 1995;Guillette and Guillette, 1996;Sumpter, 1998a,b) and the presumed association of xenobiotic exposure with reduced semen quality in men (Sharpe and Skakkebaek, 1993) and the incidence of testicular dysgenesis syndrome (tDS) (Sharpe and Skakkebaek, 2008), an enormous effort was brought forth to understand these observations and, in conjunction, to regulate the potential exposure to these compounds (Gray et al., 2000;Gray et al., 2001;Gray et al., 2006;Gray et al., 2009;Scott et al., 2009;Ryan et al., 2010a;Ryan et al., 2010b). While the early observations certainly motivated toxicologists to better understand the scientific basis for the reported effects, a veritable public media hype evolved (Colborn et al., 1993;Colborn, 1994;Colborn, 1995) echoing endlessly and uncritically over the next 10 years and beyond among politicians, government, NGOs and academics.
While the former resulted in the advancement of our current understanding of reproductive and developmental toxicology and the role of compounds with endocrine activity, the latter, while being instrumental in providing improved assessment and regulation of endocrine active compounds, resulted in a gross overstatement of the actual risks. Indeed, despite being well known to everyone that humans are not rodents, nor amphibians, reptilians, birds or fish, and that exposure scenarios of humans in many cases are qualitatively and quantitatively different from that of the other species, the reported effects of a given compound in a given species is taken almost as proof that this very effect could also take place in humans (vom Saal et al., 2007;Alonso-Magdalena et al., 2010). Some of the toxicologists involved in the endocrine active substances (EAS) field may have underestimated that evidence and mechanism based toxicology in the species at risk as the only pathway that could lead to a thorough risk evaluation and, thus, to proper communication of the real risk. Other toxicologists have recognized the power of the new tools available, such as in silico and in Summary "Endocrine disruption" is a public and political buzzword that has and is still receiving high media attention. Based on the latter, numerous tiered testing strategies have evolved that should ensure that humans will not run a health risk due to the voluntary or involuntary exposure to endocrine active compounds (EAS). An analysis of the currently available knowledge on EAS mediated endocrine disruption in humans demonstrates that there are very few EAS that causally induce endocrine disruptive effects. Conversely, the association EAS exposure with increased risk or incidences of endocrine disruptive effects in humans are difficult to reconcile with the results from animal studies. Consequently, the analysis of the traditional and historically grown tiered approach in EAS testing, often at very high doses or concentrations, demonstrates that the likelihood of detecting EAS with true potential for endocrine disruption in humans is very low, primarily due to inherent differences between the surrogate species and the human, and will provide for a high number of false-positives commensurate with low efficiency, high cost, and often violently disputed interpretations of what the data would mean for human risk assessment. It is thus proposed that EAS testing for putative endocrine disruption in humans and qualitative and quantitative evaluation for risk assessment purposes should be entirely focused on human data, and derived from a combination of in silico and in vitro systems, PBPK modeling, metabonomic or genomic profiling of human tissue, realistic human EAS exposure, dose-effect principles and adverse effect scenarios, human patient or exposure cohort datasets, etc. Animals models should be used only where specific pathways in endocrine physiology and thus development and reproduction is nearly identical to the situation in the human, thereby guaranteeing that causal exposure and effect relationships in the animals can be extrapolated to the human Keywords: risk assessment in humans, integrated testing strategies, combinatorial toxicology of surrogate species with the known (and yet unknown) physiological and endocrine system differences (Scott et al., 2009) that could heavily influence the outcome of the risk assessment (vide infra). In order to achieve a clearer train of thought, eAS mediated effects in the environment (ecotoxicology) are not considered further in this opinion paper. thus, the main focus of this paper are the real or perceived effects in humans, how these are being tested, and how a paradigm shift could be achieved by embracing modern science (in silico and in vitro with supportive evidence from epidemiologic studies) in conjunction with real risk calculations and some courage for simplification and imperfection.

Historical cases and the "no failure/no-risk" paradigm
When considering the contention that we are continuously exposed to eAS and thus we and our future generations are potentially at risk for diseases and dysfunctions as a result of endocrine disruption, the question must be raised whether this is really true. eAS may act via a diversity of mechanisms at the level of receptor binding, such as post receptor activation, and at the level of hormone synthesis, storage, release, transport, and clearance, including hormonal homeostasis, at the cellular level. thus in theory one could imagine that any unwanted interaction with normal hormonal function would indeed bring about the feared adverse endocrine disruptive effects in humans. However, as Steve Safe (1993), in his response to Sharpe and Skakkebaek's assumption that increasing incidence of reproductive abnormalities in the human male may be related to increased estrogen (eAS) exposure in utero (Sharpe and Skakkebaek, 1993), correctly pointed out: humans are continuously exposed to indigenous compounds and environmental mixtures containing synthetic chemicals as well as natural products and that these mixtures contain compounds with "pro-active (e.g. estrogenic)" and "contra-active (e.g. antiestrogenic)" activities. thus the resulting "endocrine disruptive" effects in a given human must be seen as the summation of all effects including the individual genetic predilections and habitual preferences that predispose for the observed disease.
Indeed, when considering the high volumes of halogenated compounds (e.g. DDt, Methoxychlor, PCBs, etc.) used and deployed into the environment and thus found in relatively high concentrations food, water, and even air in the 1930-1970s, it seems rather surprising that high incidences of "endocrine disruption" in humans were not registered (Smith, 2001). On the contrary, when looking at the human growth physiology during human development, Rosenbloom (2008) reported that during the 150 years preceding the mid-20 th century, there was a secular trend in the pace of maturation and adult size of individuals in the Western countries. the age of menarche in girls has declined from 17 years to 12.5 years over the past 150 years. the most apparent explanation for this phenomenon is the improvement in nutrition and reduction in childhood disease frequency and duration with attendant salutary effects on the endocrine milieu (e.g. hormones affecting growth (GH), insulin-like growth factors (IGFs), sex steroids, etc.). this secular trend appears to have vitro methods, clearly understanding the need for evidence and mechanism based toxicology but essentially not daring to place sufficient trust in these new tools to move away from traditionally established in vivo "confirmatory" experiments with rodents (vide infra), whether these be an extended repeated dose study, an extended one-generation study currently discussed at the OeCD, or the two-generation study considered unnecessary by many. One of the main questions that comes to mind is: Why is the current situation with regard to EAS, their testing and interpretation of the results by governmental expert groups and in consequence their regulation in different countries on one hand so similar and on the other hand vehemently questioned by numerous scientists (e.g. the risk of bisphenol A in humans)?
there obviously is no single answer but rather a multitude of factors involved. Some of the main ones are listed below: -Use of studies with domination of effects (descriptive) toxicology (limited mechanistic and evidence based approaches) versus use of a selection of well designed mechanistic studies with maximum insight as to the relevance of the data for the human -Biased reporting or over-interpretation of study results (Sena et al., 2010) e.g., insufficient consideration of species differences, dose-response principles, experimental design issues, etc., often in conjunction with highly visible and thus politically "in" research areas (e.g. obesity research) -A penchant for holding onto the "false-safety" of traditional, i.e. historical approaches and thus repeating past mistakes rather than embracing modern science -Inadequate and counter-"common sense" use of weight-ofevidence in the interpretation and reporting of risk to humans (Smith, 2001) -Individuals and groups in society that are willing to interpret and extrapolate toxicological data that causes exaggerated concern -experts with extremely disparate expertise involved in the risk assessment process Indeed at the outset of eAS research, "endocrine disruption," i.e., the adverse effects secondary to the activity of a given eAS (Jacobs et al., 2008), was described for the environment in a multitude of species as well as in humans. thus, effects observed in any species have been indiscriminately considered as being relevant for all other species without subsequent proof, or there has been an insufficiently clear distinction between EAS mediated "endocrine disruption" in the environment (ecotoxicology) from eAS mediated "endocrine disruption" in humans (human toxicology). Moreover, as will be discussed below, even when presumed eAS mediated effects or risks thereof in humans are being discussed, too little distinction is made between true effects in humans (human toxicology) and effects in mammalians (primarily rodents, thus rodent toxicology). However, weight of evidence based toxicology would demand that all perceived and real risks be evaluated on the basis of the species at risk, i.e. the human, and not via a multitude While 80% of the female offspring presented with vaginal adenosis when exposed to a total doses (≥12,000 mg) prior to nine weeks of gestation, 0% were noted when exposure was ≤700 mg and no exposure occurred before week 22 of gestation. Generally the risk for urogenital abnormalities decreased linearly with increasing week of gestation at first exposure, and, within week of gestation, with decreasing dose. thus a median total dose of DES of 2530 mg was significantly associated with no overt urogenital abnormalities in female offspring, while conversely, a median dose of 11,025 mg DES was significantly associated with urogenital abnormalities (Jefferies et al., 1984;Swan, 2000).
Unfortunately, the published data by Jefferies et al. (1984) do not allow retrospective determination of a cumulative NOAel for specified windows of or the whole gestational period. With regard to the long-term health risk, exposure to DeS has been associated with an increased risk for breast cancer in DeS mothers (relative risk, <2.0) and with a lifetime risk of clear-cell cervicovaginal cancer in DeS daughters of 1/1000 to 1/10,000 (Giusti et al., 1995). Although the lifetime risk of clear-cell cervicovaginal cancer in DeS daughters is unacceptably high from a patient stand-point, it appears rather low when considering that DeS binds to human sex hormone binding globulin (shbg) with an affinity (>250-fold lower than 17β-estradiol (E2)) (Hodgert Jury et al., 2000), and thus would have been available to the fetus especially when applied at high concentrations throughout gestation.
the endocrine disruptive effect, and possibly to some extent also the carcinogenic effect (Gladek and liehr, 1989;Cunningham et al., 1996;Block et al., 2000;Ma, 2009), appears to be mediated by the affinity of DES for the estrogen receptor (ERα and possibly β) and the resulting alterations in genetic pathways governing sexual organ differentiation. Indeed, DeS has a relative binding affinity (RBA) of 17% to full length human ERα when compared to the 100% of 17β -estradiol (E2) (Freyberger et al., 2010a), and thus the capability of displacing the endogenous E2 at the ERα at high DES concentrations under physiological conditions. Indeed, approximately 47% e2 can be displaced from plasma shbg under physiological conditions by high concentrations of DeS (Hodgert Jury et al., 2000) suggesting that beside the untimely high concentrations of DeS also higher e2 concentrations could be readily available to the developing fetus during DeS exposure.
The example of DES thus elegantly demonstrates that even for the most potent EAS in humans (with regard to endocrine disruption) known to date, it is primarily a "high dose, specific activity, and prolonged time during a critical period principle" that governs the manifestation of "endocrine disruptive effects" in humans.
In view of the overdosing issues during gestation with androgens and progestogens (tab. 1) and the DeS catastrophy it is understandable that the public wants to prevent and will not tolerate repetition of a similar event. However, in hindsight, DeS appears to represent the worst-case scenario. While it can and should be used as a point of orientation for the public it must leveled off in the last 50 years, albeit over-nutrition and lack of exercise with ensuing obesity, increased rates of growth, accelerated skeletal maturation, and advancement of pubertal onset in girls is increasingly observed (Root and Diamond, 2007). In view of the above it appears that exposure to eAS had no overt impact on the development and health of humans: So why is there such a hype and consequently public paranoia and what are we worried about?
Certainly one of the reasons is that there are always individuals and groups in society that are willing to interpret and extrapolate toxicological data in a way that it causes exaggerated concern, especially when some indication of an effect has been gleaned from animal studies or, more recently, from gene expression studies. the latter is even more pronouncedly problematic when toxicologists, in their dispute over the interpretation of the data, attack one another at a personal level and reach out to the public in a quest of "being right" rather than aiming for the balanced and professional discussion required. Naturally this distorted view once released to the public via media is always linked to the historical background of a very few select cases where eAS exposure causally resulted in overt endocrine disruption in humans (tab. 1) with ensuing permanent reproductive incapacitation and/or the development of cancer.
However, when considering these historical cases it is fact that, without exemption, all eAS resulting in proven endocrine disruption in humans (urogenital abnormalities and infertility) were highly dosed steroidal pharmaceuticals or the non-steroidal diethylstilbestrol (DES) (Tab. 1) primarily and specifically applied from the late 1940s to the end of the 1960s during pregnancy (Whitelaw et al., 1966, Schardein, 1980Mittendorf, 1995;Palmer et al., 2009) or, as in the case of the androgens, also in the treatment of tumors, alopecia, nausea and vomiting, hypotension, and pruritis (see Schardein, 1980 for review). One of the key eAS, namely DeS-induced urogenital abnormalities in male and female offspring as well as a clear-cell cervicovaginal cancer in female offspring, occurred primarily when DES was applied early (first trimester) in pregnancy and when applied at high doses (total cumulative dose >5000 mg per pregnancy) (Mittendorf, 1995;Veurink et al., 2005;Palmer et al., 2009). Not surprisingly, the reported incidences of malformation in the high dose groups are extremely high.
However, it is also noteworthy that there is also a large proportion of the in utero DeS exposed population that presents with no adverse effects at all. Moreover, despite realizing that table 1 is far from providing a complete picture, it is surprising that despite between 5-10 million fetuses and their mothers in the USA and europe alone having been exposed to DeS during gestation (Giusti et al., 1995), the number of reported overt urogenital abnormalities in male and female offspring as well as clear-cell cervico-vaginal cancer appear relatively low. While <0.1% of the daughters exposed to DeS in utero presented with clear-cell cervico-vaginal cancer, the incidence of overt urogenital abnormalities (vaginal adenosis) in female offspring was largely dependent on the time, duration, and dose of DeS applied during pregnancy (Swan, 2000).
be emphasized that this is the exception and not the rule. More importantly, the DeS scenario must serve as a point of departure for toxicologists to provide an understanding of how eAS should and can be tested and how dose principles apply in human risk assessment. Indeed, as will be discussed below, the potential for high-dose adverse effects of DeS possibly could have been foreseen and prevented if current human based in vitro (e.g. human steroid receptor binding, steroid transactivation, steroidogenic enzyme deregulation and/or inhibition, steroidogenesis, via the H295R) and in silico technology (e.g. pharmacologically based pharmacokinetic (PBPK) modeling), as well as current knowledge of human sexual development (presence/absence of receptors during different periods of gestation) (Neill et al., 2006a,b), appropriate dose-risk factor calculations, and the demand for proof of pharmacological efficacy and benefit not only for problematic pregnancies (miscarriages and premature birth) but also normal pregnancies (Giusti et al., 1995) would have been applied. In contrast, the DeS problematic of transplacental carcinogenesis could most likely not have been determined using the historical toxicological approach with the routinely applied rodent in vivo tests, i.e., largely depending on the recognition that the mouse strain used for the in vivo tests are either DeS resistant (C57Bl/6) or susceptible (CD-1) (Ma, 2009), while determination of a point-of-departure for urogenital changes would have been difficult due to the major strain differences in the mouse strains employed (Greenman et al., 1977). thus, the DeS example also illustrates that the toxicological thinking and approach must change in the 21 st century, but more explicitly emphasizes the need for improved toxicological reporting and rapport with the public in the 21 st century.

The "association" issue
Again, coming back to the original question why issues such as the risk assessment and especially the communication of health risks of eAS appears to be so confusing, it must be stated that this author is not aware of any eAS presently under discussion (prima facie excluding those EAS specifically applied in high doses to combat cancer or other diseases e.g. Cushing syndrome in a patient) where causality between exposure to the eAS and endocrine disruptive effects in the exposed or the offspring have been conclusively reported. On the contrary, for glitazones, despite involuntary dosing at 4 mg/d and specific exposure to during gestation, no endocrine disruption in the offspring was observed (Yaris et al., 2004;Kalyoncu et al., 2005;Choi et al., 2006;Haddad et al., 2008). Similarly, for ketoconazole used as an oral broad-spectrum antifungal agent, daily doses of 600-1000 mg were insufficient to perturb normal masculinization or provide for increased congenital abnormalities in the offspring (Scott et al., 2009).
For some eAS (phthalates, bisphenol A, PCBs, Dioxins, DDt metabolites), however, an "association" of exposure with the increased incidence of specific adverse health effects, incl. those typically listed under endocrine disruption, were reported or proposed (Guo et al., 2004;Swan et al., 2005;Bustamante-Montes et al., 2008;Sharpe and Skakkebaek, 2008;Chou et al., 2009;Alonso-Magdalena et al., 2010;Melzer et al., 2010). An association, in simple terms, is any relationship between two measured quantities that renders them statistically dependent, i.e., they are dependent on one another with some degree of likelihood. An association does not, however, connote that "cause and effect" exist.
Depending on the stringency of hypothesis(es), the observations that european storks are more frequent in spring can be associated with the higher frequency of human babies also in spring. Consequently, one could conclude (correctly, from the strong association of these two parameters) that storks deliver the babies. the latter example, although obviously silly, demonstrates that in absence of a credible biological mechanism and an appropriate dose or exposure regimen, associations between the exposure to an eAS that, in most cases, ever so slightly increased incidences of adverse effects in our Western population (Borrell, 2010a), are of little or no value at all beyond creating an atmosphere of insecurity and hysteria in the public.
thus, for example, the association between phthalate exposure during gestation and ensuing endocrine disruption in the human offspring is controversial. Indeed, while some cross-sectional studies see a negative correlation between phthalate metabolites in mother's urine and the anogenital distance (AGD) and thus penile volume/length and cryptorchidism in the male babies (Swan et al., 2005;Bustamante-Montes et al., 2008;Swan, 2008), a similar study found the complete opposite , i.e., no association between demonstrated exposure and effect. In contrast, a significant correlation between early onset of puberty and phthalate exposure (via fish consumption and use of plastic cups) was reported in taiwanese girls . the problem with all of these studies is that all other potentially confounding factors were not controlled for, meaning that a number of other factors, e.g., genetic predisposition, immediate environment, nutritional habits, life-style, medical conditions, obesity (not body weight), physical training, etc. of the mothers or the offspring could potentially have influenced the outcome of these studies to a similar or greater extent than the primary parameter of interest (in this case the compound) investigated.
Human in vitro studies investigating the effect of di(nbutyl) phthalate (DBP), respectively its metabolite monobutyl phthalate (MBP) on leydig cell steroidogenesis in human fetal testis explants, demonstrated no adverse effects (Hallmark et al., 2007). However the latter findings were considered inconclusive by the authors as the corresponding in vivo studies with rat offspring exposed to DBP in utero and neonatal male marmosets exposed to DBP as of 4 days post-partum demonstrated inhibition of steroidogenesis (lowered testosterone levels), while the in vitro rat fetal testis explants did not. Other in vitro studies demonstrated lack of DBP binding to the human ERα (Freyberger et al., 2010a) and DBP and MPB to the rat androgen receptor (AR) (Freyberger and Ahr, 2004;Freyberger et al., 2010a), low or inexistent binding to shbg (Hodgert Jury et al., 2000), and lack of DBP in the eR and AR agonistic and antagonistic activity (Freyberger et al., 2010b;Witters et al., 2010), demonstrating that the phthalate metabolites and not the al., 2000; Matthews et al., 2001). IC50 values (e2 competitive mode) of BPA binding to ERα and ERβ were 3.6 x 10 -5 M and 9.6 x 10 -7 M, respectively, which compares to e2 binding of 2.9 x 10 -9 M and 3.6 x 10 -9 M, although slightly different numbers were observed depending on the eR source used (Chapin et al., 2008).
Surprisingly, absent or low activity in MVlN cells constitutively expressing the human ERα were reported (Freyberger and Schmuck, 2005), whereas in other reporter gene assays some degree of transactivation was observed (Fang et al., 2000;Matthews et al., 2001). In the latter case, the eC50 for BPA induced luciferase expression in MCF-7 cells transiently transfected with ERα and ERβ was 7.1 x 10 -7 M and 4.5 x 10 -7 M, respectively, which compares to e2 induced luciferase expression of 5.3 x 10 -11 M and 8.3 x 10 -11 M (Matthews et al., 2001). this suggests that regarding the low binding of BPA to the shbg, it would take at least a 10,000-fold higher BPA than e2 concentration in situ (in utero, plasmas levels, fat body levels, etc.) to bind to the eR to a similar extent as e2 under physiological conditions and thus to evoke similar responses as e2. the latter observation is also corroborated by the co-treatment experiments with the MCF-7 cells transiently transfected with ERα and ERβ, which demonstrated that co-addition of 10 nM e2 and 10 µM BPA resulted in no change in luciferase expression when compared with the effect of 10 nM e2 alone (Matthews et al., 2001).
thus beyond the controversial association of higher risk for disease in humans and the even more controversially discussed results of animal studies, the historical approach of toxicological risk assessment, including allometric scaling and pharmacologically based pharmacokinetic (PBPK) models with interspecies scaling, interaction with the eR, determination of actual human exposure, use of safety factors (NRC, 2000), etc., would not provide for an increased endocrine disruptive health risk for humans via exposure to BPA (Goodman et al., 2006;Chapin et al., 2008;Willhite et al., 2008;Goodman et al., 2009). the latter view is presently shared by all expert teams of national authorities involved in human risk assessment of BPA.
However, the examples of phthalates and BPA clearly demonstrate the problems toxicologists face when using animal studies as either proof of presence or absence of an adverse effect of a given compound and when aiming to extrapolate the findings to a potential human health risk. this is especially true when, despite a plethora of available mechanistic data with demonstrated dose-responses and appropriate use of safety factors (Gray et al., 2000(Gray et al., , 2009(Gray et al., , 2010Ryan et al., 2010a;Ryan et al., 2010b), other scientist doubt the appropriateness of the experimental data used, the data interpretation, and the associated risk assessment (Richter et al., 2007;vom Saal et al., 2007;Myers et al., 2009;Somm et al., 2009;Vandenberg et al., 2010). Moreover, these examples illustrate that the outcomes of the animal studies are difficult to reconcile with reported association of higher risks for endocrine disruptive effects with exposure to these eAS in humans.
Therefore, the question must be asked: why use animals as human surrogates in human safety assessment at all, and is there real safety provided by routine, indiscriminate animal testing? parent compounds are the eAS to be considered. Additional results with higher relevance for the human could potentially be obtained with the H295R steroidogenesis assay (Hecker et al., 2007), which can determine whether or not a parent or metabolite has the capacity for increasing or inhibiting human steroidogenesis (Song et al., 2008).
However, despite rodent-specific effects being highly likely, many toxicologists and endocrinologists typically "hang-on" to the routine rodent model and the observed endocrine disruptive effects. Moreover, marmoset experiments with phthalates do not appear to be of great help either, as only the direct comparison of fetal marmoset explants with human fetal explants and additional steroidogenesis assays with human cells (H295R) would provide any insight as to whether or not the effects observed in marmosets have any meaning relevance for the human, e.g., i.) whether the same enzymes involved in steroidogenesis are inhibited, suppressed, or over-expressed with the concomitant ensuing differences in steroid levels; and ii.) whether these changes occur at the same phthalate concentrations levels of parent compound and/or metabolite in marmosets and human fetal testis explants and the H295R assay.
Similarly, recent reports on the association of higher bisphenol A (BPA) levels in the urine with higher risks of diabetes and heart disease (lang et al., 2008;Melzer et al., 2010) were based on a large cohort study with measured urine adjusted for age, sex, race/ethnicity, education, income, smoking, body mass index, waist circumference, and urinary creatinine concentration. Similarly to the stork and baby delivery paradigm, no mechanistic explanation was put forward as to how BPA could affect the higher chance of diabetic or cardiovascular morbidity in humans, but rather relies heavily on findings in mice (Alonso-Magdalena et al., 2010) and rats (Somm et al., 2009). In contrast to the latter, another cross-sectional study in fertile men (Mendiola et al., 2010) investigating the relationship between BPA exposure and reproductive parameters, hereby clearly alluding to the demonstrated endocrine activity of BPA in in vitro systems (Matthews et al., 2001) and the controversial endocrine activities of BPA in vivo animal tests (Nagao et al., 2002;Richter et al., 2007;tyl et al., 2008;Sharpe, 2010), reported no association between reproductive function and environmental BPA exposure (though an association with a modest reduction of testosterone was observed).
Similarly, Meeker et al. (2009) found in a study with men from an infertility clinic that urinary BPA concentrations may be associated with altered hormone levels in men. the latter findings stand in stark contrast to the findings for BPA in rats (Ryan et al., 2010a), where BPA exposure in utero and during lactation had no effect on the expression of well-characterized sexually dimorphic behaviors or the age of puberty or reproductive function in the female rat offspring. When using "human" in vitro assays, BPA shows very low binding to the shbg (Hodgert Jury et al., 2000), and a very low RBA (0.05%) to a receptor transcribed from recombinant human ERα complementary DNA (Kuiper et al., 1997). Similarly, binding of BPA to a partial-length ERα (GST-hERαdef fusion protein) and ERβ (recombinant full-length hERβ) demonstrated stronger binding of BPA to ERβ than to the partial-length ERα (Fang et are extremely high to provide the highest likelihood of detecting eAS. However, when applying the calculations of Hartung (2009) to the eAS problematic, it becomes abundantly clear that even if compounds, e.g. phthalates, PCBs, dioxins, and others, are positive in the animal tests, the likelihood that these represent false-positive results, thus having no meaning for humans, could be very high. the latter appears even truer when considering that numerous animal studies were negative for BPA. So, are the negative BPA animal studies, false negatives, or conversely the reported positive BPA animal studies false positive findings? Or, could it be theoretically possible, by chance, that phthalates and/or BPA are amongst those very rare eAS that are truly positive with regard to endocrine disruptive effects in humans under realistic exposure scenarios? Although there is no absolute answer to these questions, but represent, as with all risk evaluations, only likelihood calculations, the chances that either phthalates or BPA could mediate endocrine disruptive effects in humans under realistic exposure scenarios appears extremely low, especially considering the very small number (possibly <30) of eAS (vide supra, thus excluding non-endocrine reproductive toxicants) with causally proven endocrine disruptive (including reproductive) effects at high cumulative doses in humans.
Indeed, most eAS used today as clear positives in in silico, in vitro and in in vivo animal tests are pharmaceuticals tailored to interact with the endocrine system (e.g. ethinylestradiol, amiodarone, tamoxifen, fadrozole, flutamide, etc.) and consequently are also active in the in vivo animal test at high doses, whereas in contrast nearly all "environmental" eAS which humans are exposed to occur at extremely low concentrations in the environment and have again effects in animal studies at very high concentrations. Consequently, the likelihood of uniquely indentifying "environmental" eAS that are truly positive with regard to endocrine disruptive effects in humans in animals studies -whether these be based on subacute (OeCD test Guideline 407) (Gelbke et al., 2007), the 90-day reproductive test (OeCD test Guideline 408), an extended one-generation (Spielmann and Vogel, 2008) or a two-generation animal test (OeCD test Guideline 416) with one or two species -is nearly nil, i.e., is occluded/obscured within the group of positive compounds in the animal tests or when species-specific mechanisms are also prevalent within the negative compounds.
Although, to this authors' knowledge, all of the known truly positive eAS with regard to endocrine disruptive effects in humans were also detectable in animal assays, little information is available with regard to the concordance between the human and typical animal assays. this may suggest that moderate and strong human eAS can be detected whereas weak human eAS could be missed. On the other hand, the argument that "negative in the animal test is truly negative in humans" cannot hold, either, especially as determination of "absence of effect" is not possible. Indeed, as stated earlier by levitt, "Nine out ten experimental drugs fail in clinical studies because we cannot accurately predict how they will behave in people based on laboratory and animal studies." (Hartung and Daston, 2009;Shanks et al., 2009). Consequently, the indiscriminate use of animals in the traditional assessment of eAS in humans appears Indeed, any additional rodent testing with BPA, such as the 30 million US$ study program instigated by the NIeHS (Borrell, 2010b), is not expected to bring about new insight with regard to the potential health risks of BPA to humans, beyond the actual health risks of BPA to rodents. the latter study program is therefore more of political nature, intended to appease the incessant critics of the current BPA risk assessment (Richter et al., 2007;vom Saal et al., 2007;Myers et al., 2009;Somm et al., 2009;Vandenberg et al., 2010) rather than for scientific enlightenment. Obviously, the use of rodents has a strong historical component plus the great advantage of ready access, ease of genetic manipulation and the possibility of maintaining high numbers of animals for a single experiment. However, the obvious and often cited down-side is the fact that, due to the differences between rodents, other species (incl. subhuman primates) and humans, routine animal studies may identify highly and, to some extent, also moderately potent eAS with potential for human adverse effects, but may prove difficult in the identification of weak eAS.
Indeed, in an effort to estimate the potential for correct prediction of reproductive toxicity in humans with routine animal studies, Hartung (2009) used an estimate of 138 of 5500 chemicals to be tested under ReACH in the european Union that could be true reproductive toxicants in humans (Bremer et al., 2007). When using the reported concordance between species of approximately 60% for reproductive toxicity testing (two-generation study in rats, where toxic effects are followed not only in F1 generation of exposed rats and after further mating in the F2 generation), only 83 of these 5500 chemicals (1.5%) would be detected as true positives, 2145 (39%) as false-positive, 55 (1%) false negative, and 3217 (58.5%) as true negatives. When using a second species (mouse or rabbit) to test the negatives, and using again a 60% concordance, another 33 of the 55 true reproductive toxicants missed in the first species would be detected, leaving a total of 1309 (40%) false-positives, 22 falsenegatives, and 1908 true negatives.
the upshot of this is that only 116 (2.1%) of the true human reproductive toxicants can be identified, while 22 (0.4%) would remain undetected and 3454 (63%) would register as falsepositives. Beyond the latter, the concordance between animals and humans is most likely even lower, owing to the fact that in prospective animal studies mothers are exposed to maximal tolerated, and thus unrealistically high, doses. the result of the latter is that a high number of false-positive and thus putative eAS with no relevance for human endocrine disruption are being characterized with additional testing (in vitro and in vivo) and subsequently regulated without having any adverse impact on human health but a major impact on economics. even more problematic is the low number of true human eAS with ensuing endocrine disruptive effects (false-negatives in the animal tests) that are not detected, as is demonstrated, for example, by the relatively high number of pharmaceuticals that demonstrate severe adverse drug reactions in patients despite intensive testing in surrogate species and thus have to be taken off the market.
As any failure in detecting eAS with true endocrine disruptive effects in humans appears inacceptable, public and media pressures on toxicologists in regulating agencies and industry at a fraction of the cost of full-scale animal testing." toxCast, in using rat and human in vitro systems as well as employing rat in vivo data and human disease classes for comparison, aims to provide for a system that can "achieve a higher predictive power than single assays or chemical structure alone." Since toxCast is purposely inclusive for many toxicological endpoints and thus may lack some sensitivity and specificity, it certainly is a valuable tool for identifying potential toxicants and thus for prioritizing a reduced number of chemicals of ensuing in-depth evaluation. Obviously, the prioritization power of toxCast for compounds, e.g., BPA and 2,2-bis-(4-hydroxyphenyl)-1,1,1tricholoro-ethane, the metabolite of methoxychlor, linuron, vinclozolin, or prochloraz, stems from the relatively high predictive power of some of the in vitro assays (human eR or AR binding) for the in vivo rodent effects (uterotrophic or Hershberger assay). Indeed, in comparing the concordance of recombinant human ERα and AR assays with the uterotrophic (74 chemicals used) or Hershberger assay (80 chemical used), a group of scientists at the Chemicals evaluation and Research Institute (CeRI), Japan, found a concordance of 81% and 74%, respectively. The false-positive and false-negative rates for the hERα binding assays were 25% and 13%, respectively, and 50% falsepositive and 10% false-negative for the AR binding. However, this high concordance was achieved only when a cut-off RBA limit for the detection of estrogenic/anti-estrogenic and androgenic/anti-androgenic responses was introduced. A recently published part of the latter study (Akahori et al., 2008) comparing the estrogenic/anti-estrogenic response of 65 chemicals in the recombinant hERα binding and the in vivo rat uterotrophic assay, also demonstrated 82% concordance, 14% false negatives and 23% false-positives. However, using all data for the hERα binding and the uterotrophic response without employing the cut-off RBA = 0.00233%, i.e., the lowest eR binding potency that elicits estrogenic/anti-estrogenic activity in the uterotrophic assay, resulted in a much reduced concordance of 66%, as well as false-negative and false-positive rates of 14% and 57%, respectively. the latter analysis thus emphasizes an important caveat: The human steroid binding assays were compared to the rodent in vivo response and NOT to a putative human in vivo response, thus effectively testing how well one can predict an EAS mediated response in rodents with human in vitro ER and AR binding assays. the latter, unfortunately, also appears to hold true for the results of toxCast. Maybe this is one of the reasons, beyond the quest for more in-depth optimization for sensitivity and reproducibility of the most current in vitro assays for eAS detection (steroid receptor binding assays, reporter gene assays, etc.), that even scientists intricately involved in toxCast feel that the data provided by toxCast should not, and cannot (yet), be used for any form of qualitative or quantitative risk assessment.
Indeed, it must be emphasized that with the in-depth knowledge of pharmaceuticals tailored to interfere with the human endocrine system and the corresponding human in vivo response data (adverse responses, kinetic data, tissue concentrations, etc.) it is incomprehensible why no datasets have been developed that allow the direct comparison of, for instance, heR/hAR/htR binding data with corresponding human in vivo responses. the inadequate for the achievement of its primary purpose, but also problematic as animal studies also leave too much room for interpretation with regard to what they mean in human health risk assessment.
So rather than indiscriminately testing with rodents and other species and then trying to understand and explain why or why not the data can or cannot be extrapolated and used for human health risk assessment, one could envision starting and remaining with human specific test systems as long as possible. Once sufficient information has been acquired with human specific test systems, in vitro (e.g. segment I and segment II) and in vivo animal tests, where the underlying mechanisms of eAS interaction for a single or a number of well defined endpoint(s) has been demonstrated to be nearly identical with those in the human, could be employed in eAS testing (vide infra).

From the "flat-earth principle" to new developments
Although the problems of species-and strain-specific effects (Swenberg et al., 1992;lynch et al., 1996;Neill et al., 2006a,b;Ma, 2009), the comparison of the usually in-bred rodent species with an out-bred human, limited comparability of the rodent and human metabolism, endocrine system, and sexual development have long been recognized (Dohler et al., 1979;takayama et al., 1986;McClain, 1995;Sharpe et al., 1995;Braverman et al., 2006;Scott et al., 2009;Sharpe, 2010), prima facie all of the toxicological strategies of the USePA (http://www.epa. gov/endo/pubs/assayvalidation/status.htm), OeCD, and european Community (Gelbke et al., 2007) maintain the historically grown approach of routine animal testing (tab. 2). Moreover, even when presence or absence of a dose-response of an eAS is observed, species-and strain specific-effects (e.g. metabolism, differences in development, presence or absence of shbg, steroid receptors during the critical window of exposure, levels of hormones, expression and levels of enzymes involved in steroidogenesis, etc.) will have to be taken into consideration, i.e., either by comparing directly to the known respective situation in the human or accounted for using risk factors (often a factor 10 for species differences) during risk extrapolation.
the main underlying problem of all of these strategies is that none of them have established databases that demonstrate how many of the eAS with demonstrated causal endocrine disruptive effects in humans could have been predicted with the routine animal tests. the latter is similar to profusely maintaining the earth is flat, whilst almost grudgingly recognizing that all indications are suggesting that the earth is a globe. thus, in view of the lack of evidence that routine animal experiments can truly predict the potential activity of eAS in humans, alternative approaches with much higher confidence in correct prediction, efficacy, and capacity -and thus public trust -are needed. One of the approaches under study is the toxCast program of the USA (Judson et al., 2009), which "is a large-scale experiment using a battery of in vitro, high-throughput screening (HTS) assays, applied to a relatively large and diverse chemical space, to develop methods to predict potential toxicity of environmental chemicals underpassed, combined with a clear description of the potential downstream consequences, i.e., biological meaning and presumed adverse effects if these critical levels depart from normal -Meaningful high-throughput in silico and in vitro assays using human data and cell systems that can detect these endpoints with sufficient robustness, reproducibility, and sensitivity, whereby international agreement should be found as to how robustness, reproducibility, and sensitivity are defined (cutoff values). -Use of those in vivo animal assays where specific pathways in endocrine physiology and thus development and reproduction is nearly identical to the situation in the human, thereby guaranteeing that causal exposure and effect relationships in the animals can be extrapolated to the human.
the above procedure would hopefully also ensure that better international agreement could be achieved on what the data mean and how these could be used for human risk assessment. Agreement on which pathways and endpoints may or may not be relevant in the human, and the way these should be weighed in the context of human risk assessment, will be a matter of hot dispute among experts. Indiscriminate use of all data in the public domain certainly would prohibit this process. Indeed, criteria about the type of data and the quality thereof should be defined. A first step in this direction was initiated with the Klimisch Criteria (Klimisch et al., 1997) and expanded by Schneider et al. (2009) thus providing guidance how the data could be selected and quality ensured. A subsequent, more critical analysis of the available data via meta analysis (as exemplified via the CAMA-RADeS (Collaborative Approach to Meta-analysis and Review of Animal Data in experimental Studies)) would allow determination of whether or not the reported findings were interpreted correctly (Sena et al., 2010). For some, but certainly not a sufficient number, of the endpoints, in silico (Bovee et al., 2008;Breen et al., 2010;Rusyn and Daston, 2010) and in vitro methods have been developed (Akahori et al., 2008;Freyberger et al., 2010a;Witters et al., 2010) and, in some cases, already validated under the auspices of the OeCD, eCVAM or the USePA/ICCVAM. However, as often cited, the validation of these assays, e.g., within the OeCD, is taking too long (up to 10 years) (Judson et al., 2009). One of the problems associated with the overly long duration of assay validation is that there is no international consensus on which assays are specifically needed to address a specific end-latter would allow to determine a.) the concordance incl. true false-negatives and false positives; and b.) setting of cut-off limits, e.g., RBA in the heR/hAR/htR binding assays below which no adverse effects in humans would be expected.

The radical way forward via reversal of the traditional process
When considering the already existing plethora of data on human endocrine physiology, sexual development (Neill et al., 2006a,b), the decades of experience with pharmaceuticals tailored to interfere with the endocrine system (whether as contraceptives or in order to ameliorate endocrine related diseases), and, more recently, the huge amounts of data that are available from exposure determinations in humans for pharmaceuticals, DeS, BPA and phthalates, it is high time to depart from traditional routine animal testing approach and directly drive for human risk assessment by embracing modern science (in silico and in vitro with supportive evidence from epidemiologic studies) in conjunction with real risk calculations and some courage for simplification and imperfection.
As already mentioned, current knowledge of endocrine physiology and sexual development in humans, and thus the effect of pharmaceuticals on the human endocrine system, has greatly advanced over the past decades (Neill et al., 2006a,b;lin et al., 2009;Patel et al., 2009;Rouiller-Fabre et al., 2009;Schteingart, 2009;Wajner et al., 2009;luu-the and labrie, 2010;taxvig et al., 2010). It is therefore possible to define critical pathways and endpoints, defined by Hartung (2010), as pathways of toxicity (Pot), and by Boekelheide and Campion (2010) as a taxonomy of adverse effects, where and how (qualitatively and quantitatively) eAS could interfere with the human endocrine system. examples are steroidogenesis, steroid or other nuclear receptor interactions, enzyme inhibition within endocrine homeostasis, increased or inhibited expression of receptors, inhibited of increase thyroid follicular cell function, etc. the major tasks to be achieved are thus to find agreement on: -the critical pathways and endpoints to be determined in humans (indeed, the majority of these have already been defined for the in vivo animal assays) -the normal levels of these endpoints at various stages of human development (conception to grave) -Critical levels of specific parameters (e.g. steroid concentrations and enzyme activities) that should not be surpassed/ Receptor binding; transcriptional activation; steroidogenesis in vitro; QSAR 3. In vitro assays (single endocrine effect) Uterotrophic assay; Hershberger assay/fish vitellogenin assay 4. In vitro assays (multiple endocrine effects) Enhanced OECD 407; rat pubertal assay/fish gonadal histopathology assay 5. In vitro assays (endocrine and other effects) 1-/2-generation mammalian assay/partial or full life cycle assays (fish, birds) in the near future. Indeed, the use of human stem cells, specifically induced pluripotent stem cells (iPS) and their specific differentiation into neural, hepatic vascular, islet, skeletal cells or cardiomyocytes (Chapin and Stedman, 2009), their use as alternatives for developmental toxicity testing (Seiler et al., 2004), as well as the improved global availability of human tissue or recombinant enzymes for metabolic investigations (Jacobs et al., 2008), should allow for the development of in vitro assays with well defined and characterized endpoints that address pathways critical in the assessment of eAS in the human. Moreover, the current knowledgebase on endocrine effects in rodents and other species should allow us to define which of the mechanisms and subsequent endpoints determined is either identical or very similar to the situation in the human. Given that these "human relevant" in silico and in vitro and the corresponding animal in vivo systems are available, these should be combined to provide a basis for focused integrated assessment systems (such as exemplified by ToxCast). The important difference from toxCast being that these integrated systems for the assessment of eAS effects in humans are en-point (vide supra), but rather relies on the submission of already developed assays by member countries. Consequently, there is at best only a limited concerted international effort and financial commitment in developing specific assays from scratch (an encouraging example would be the OeCD Molecular Screening Working Group or the OeCD-VMG-Non Animal subgroup on metabolism and metabolic enzyme systems (tan et al., 2007;Jacobs et al., 2008)) that could address the most important pathways and endpoints.
Turning the current process around, namely specifically defining and developing what methods are sought to qualitatively and quantitatively determine the effects of EAS in humans, rather than evaluating and validating what is brought (offered) to the OECD, would speed up the process.
Argumentation that, at present, it is not possible to cover all pathways and endpoints with human cell systems, is certainly correct for the time being, but certainly should be proven wrong Fig. 1: Combined approach using primarily human data to arrive at a risk assessment for EAS the historically grown traditional approach and would demand that agencies are willing to make a decision whether or not a given EAS will provide for a specific human health risk as a given concentration (the quantitative aspect). to some extent agencies and organizations (OeCD, european Community, and USePA) have already inadvertently or consciously begun to move in this direction, as many of the currently used in vitro and in silico test systems, including toxCast, are being used for "prioritization" purposes. Prioritization, by definition, means that eAS are placed on a gradient of probability starting from "unlikely" to "highly likely" that an endocrine disruptive effect in humans may occur. For example, placement into the "unlikely" category could mean that a given eAS will not be tested at a higher tier with high priority or it is assumed that the "endocrine disruptive" capacity is very low. this, in itself, is already a "quantitative" and not just a qualitative decision, as most of these compounds will be tested up to the highest possible soluble concentrations in the in vitro tests. As past experiences have demonstrated, it's not feasible or sensible to categorize eAS into fixed categories of activity. Consequent use of probabilistic risk assessment (Hartung, 2010) defining a likelihood of an adverse effect at a given exposure or systemic concentration to an eAS, would provide for more realistic assessment of risk and a higher degree of safety than what is currently achieved with the traditional approach.
tirely focused on human data and derived from a combination of in vitro systems, PBPK modeling, metabonomic or genomic profiling of human tissue, specific human endpoint relevant in vivo animal models, human exposure scenarios, human patient or exposure cohort datasets, etc. (Fig. 1).
Certainly, another of the prerequisites for such computational human prediction systems is that currently available information on eAS mediated endocrine disruptive effects in humans are assembled in a database employing the pathways and endpoints developed via international consensus. Such a database, e.g., including compounds listed in table 1 as well as compounds employed today in the treatment of endocrine mediated diseases (Schteingart, 2009) or employed as contraceptives, would also be able to deliver, per compound, the critical systemic concentrations that causally provides for the increased incidence of deregulation of the respective endpoints determined in humans in vivo and via the in silico/vitro and human endpoint relevant animal assays.
One key issue to provide for a high predictivity of the proposed human computational systems is that they are sufficiently precise in defining what they will predict. Thus, prediction of "endocrine disruption," even when using human relevant data, is less likely than, for example aiming to predict "inhibition of steroidogenesis" or "reduced spermatogenesis."

Courage for simplification and imperfection
Using real exposure scenarios for eAS and the ensuing systemic concentration (Mittendorf, 1995;Chapin et al., 2008;Clewell et al., 2008) will allow comparing with the lowest effect concentrations determined with the in silico/vitro and human endpoint relevant in vivo animal assays for a given endpoint (Fig. 1). In combination with PBPK modeling (Clewell et al., 2008), the latter should allow to determine the likelihood that a given single, multiple, or cumulative exposure could result in the impairment of a given pathway (Pot) and subsequently provide for an increased chance of the occurrence of an adverse effect. Needless to say, in this case no additional safety factors are required in the risk assessment, as all of the determinations have been carried out in "human-relevant" systems. Indeed, when considering the current situation, where traditionally routine animal testing is profusely used with the obvious small gain in "real safety for humans," the question must be raised whether it is not worthwhile and timely to start moving in a new direction rather than holding on to historically grown procedures with the customary limitations and insecurity, i.e., literarily the "flat-earth," with regard to human safety. Moreover, in view of the very few proven eAS in humans to date and the preventive economics and time-scales of using the traditional routine animal test systems, common sense would dictate that the chance of developing integrated humanized computational toxicology systems for assessment of eAS in humans should be developed as soon as possible.
Use of the proposed human computational systems, however, would demand a great deal of trust in the systems employed and courage by the regulating agencies as it means departing from