Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic Pathways Using Gene Expression Data

219 Received February 7, 2016; Accepted September 27, 2016; Epub September 30, 2016; https://doi.org/10.14573/altex.1602071 vative biological functions that have often never been seen before in nature (Elowitz and Leibler, 2000). Systems biology and multiscale approaches are not only essential tools for designing novel biological functions in synthetic biology, but they also provide a framework for carrying out simulations and analyses in order to gain greater insights into the mechanisms of diseases, drug action, drug resistance, or drug toxicity, which are major biomedical challenges. Using a systems approach to analyze drug toxicity is to the best of our knowledge a field not yet fully explored and, therefore, it is the focus of the present study. When modeling genotype/phenotype relationships caused by perturbations such as disease states or drug toxicity effects, an important aspect that must be considered is that there is considerable redundancy or “promiscuity” within biological networks


Introduction
The dramatic decrease in the cost of generating high-throughput data combined with an integrative modeling approach that cuts across traditional boundaries between disciplines should lead to a better understanding of complex relationships between genotype and phenotype.These detailed biological models, extending over multiple scales in space and time, will unveil key features of biological systems such as redundancy, modularity and negative feedback that act as sources of robustness against perturbations and internal variability (Dada and Mendes, 2011).
Some recent examples of such kinds of analyses, notably in the context of the nascent field of synthetic biology, have shown the power of biological model analysis harnessed to implement inno-isms like mammals.Multicellular models, however, present a higher degree of complexity as objectives need to be defined for each cell type and tissue, as well as for the interplay between the different compartments and interfaces.Rather than considering a single solution in such complex systems, recent studies have shown that phenotypes would often lie on a surface of possible states corresponding to optimal tradeoff solutions (Pareto front) for competing objectives such as biomass, energy yields, and resource allocation (Schuetz et al., 2012).
In this study, we wanted to apply these modeling techniques for the identification of drug toxicity in the framework of tissue-specific models, such as hepatotoxicity, through their integration with assay data.The primary cause of drug failure due to non-clinical toxicology is about 59% in the candidate nomination phase, 13% in clinical Phase I and a further 8% in Phase II (Waring et al., 2015;Waterbeemd and Gifford, 2003), with adverse drug reactions being estimated to be between the fourth and sixth leading cause of death in the USA (Lazarou et al., 1998).
In order to assess drug toxicity, in vivo testing in animals and clinical studies are nowadays the gold standard assays, although such assays are intensive both in terms of time and cost (Krewski et al., 2010).Growing pressures from major initiatives currently under way such as the European regulation REACH (Gleeson et al., 2012) and the widely-accepted adoption of the 3Rs principles (Replacement, Reduction and Refinement) for toxicity testing (Törnqvist et al., 2014) urge their replacement.
Approaches for predicting hepatotoxicity in silico have been proposed based on chemical structure similarity through the development of quantitative structure-activity relationship (QSAR) models (Chen et al., 2014;Greene et al., 2010).Integration of QSAR models with data from other sources can provide context information, bringing opportunities to improve a model's predictive power, for instance, integrating ontology annotations of toxic pathologies (Myshkin et al., 2012) or data from in vitro based assays in human hepatocytes (O'Brien et al., 2006).Liu et al. (2015) built hepatotoxicity classifiers by combining chemical structure descriptors with bioactivity descriptors from high throughput assays in the ToxCast and Tox21 databases.Zhu et al. (2014) combined chemical descriptors with cell imaging endpoints (human hepatocyte imaging assay technology descriptors).Toxicogenomics-based models combine QSAR models with biomarkers from transcriptomics (Cheng et al., 2011;Low et al., 2011Low et al., , 2013) ) or metabolomics (Ebbels et al., 2007).
Here, we propose a systems approach to model hepatotoxicity that integrates toxicity assays and transcriptomics data with validated genome scale metabolic models (GEMs).Our study was developed in the framework of the eTOX project (Cases et al., 2014).eTOX provided high-quality preclinical data that was used to develop multi-scale models.The Library of Integrated Network-based Cellular Signatures (LINCS) project provided, in turn, information about gene expression profiles in human cells under chemical perturbations (Duan et al., 2014).The multi-scale integration of toxicity assay information with GEMs augmented with regulatory processes and systematic analyses of omics information should eventually provide us with greater insights into the downstream toxicity pathways and might help (Carbonell et al., 2011;Planson et al., 2012a), i.e., there are often several ways to achieve a particular biological phenotype (Payne et al., 2014).Therefore, it is possible that a model will faithfully reproduce an in vivo behavior but will be unable to predict the results of perturbations to this network (Plant, 2015).To tackle the complexity in biological systems, we need to use multi-scale approaches combining small-scale models, centered in a faithful mechanistic description of the mode of action of the perturbation, with large-scale models, being the more common approach for genome-scale metabolic models (GEMs), which essentially capture all enzymatic and transport reactions possible within a cell (O'Brien et al., 2013).This integrative approach is now possible, since advances in the past few years in both computational power and biological understanding have allowed the reconstruction of GEMs for several organisms, from relatively simple microorganisms to large animals like mammals.Such reconstructions may be general, capturing all the possible metabolic reactions that could occur in any cell type (e.g., ReconX, a genome-wide model of human metabolism and physiology (Rolfsson et al., 2011;Thiele et al., 2013)), or can be focused on the representation of some specific target cell (e.g., the hepatocyte-specific Hepatonet1 (Gille et al., 2010)).
GEMs are defined by a list of reactions as well as some constraints on the exchange of metabolites with the medium.The observed phenotype of the organism is then determined by the distribution of reaction fluxes in steady-state.Flux balance analysis (FBA) is a common technique that is applied for steady-state simulation of the cell, based on finding the best distribution of fluxes according to some mathematical expression that parallels the cell's objective (Bordbar et al., 2014).Flux variability analysis (FVA) is another associated technique that returns the boundaries for the fluxes producing the optimal solution (Bordbar et al., 2014).Traditionally, the cell's objective was considered to be maximizing growth rate, and this simplification made it possible to experimentally estimate growth in microorganisms as a linear function of some key metabolites and reactions in the cell measured under different growth conditions.This approach allows the in silico simulation of the state of the cell by computationally solving an optimization problem under some given environmental conditions.Solutions to the stoichiometric model are determined mathematically using linear programming under the assumption of a biochemical objective such as growth rate maximization.For instance, in the case of Escherichia coli, several reconstructed models have appeared in the course of the last 15 years containing an increasing number of genes and reactions (2251 reactions, 1136 metabolites, 1366 genes in the 2011 version).Quantitative structure-activity relationship (QSAR) models of chemical toxicity based on chemical structure descriptors have been proposed for these microorganisms (Planson et al., 2012b).These relatively simple models for microorganisms can be extended to other organisms, including mammals, by an appropriate consideration of the complex interplay between different tissues and the organism's physiology.In a similar manner, provided that microorganism models can serve to identify suboptimal states due to cell perturbations, that approach could be extended to the assessment of potential toxicity states in multicellular organ-ed dose toxicity reports for a large number of drugs and drug candidates.Confidential compounds, with restricted access to their chemical structure, were not considered in the analysis.In this study, we extracted information only from the "clinical chemical" and "histopathological" tables of the database, focusing on a collection of findings that, according to the criteria of toxicology experts, are indicative of liver injury (Tab.S11 ).On the clinical chemistry side, these findings are: increases in transaminases (alanine aminotransferase, aspartate aminotransferase or gamma glutamyl transferase) and bilirubin (bilirubin, direct bilirubin and indirect bilirubin).On the histopathological side, any entry for which the organ affected was the liver was selected.It must be noted that every finding associated with a given compound is labeled as "compound related" or "non-compound related" according to the toxicologist's expert judgment made in the original toxicological report.In this context, compound-related means that the observed effect is considered to be caused by the compound and not by chance or external factors like the health situation of the animals, housing, etc. us to ultimately define a drug and/or particular dose as toxic or non-toxic (D'Alessandro et al., 2015;Godoy et al., 2013).
As an example, we compared the gene expression profiling signatures of three statins, pravastatin, simvastatin and rosuvastatin, which are cholesterol-lowering drugs.Although all three drugs have been linked to drug-induced liver injury (Zhu and Kruhlak, 2014), rosuvastatin appears to be more likely to cause clinically apparent liver injury (Alsheikh-Ali et al., 2005;Famularo et al., 2007;Hoofnagle et al., 2013).The exact mechanism of hepatotoxicity of statins remains unclear (Hwang et al., 2015).

Extraction of hepatotoxicity data from the eTOX database
Hepatotoxicity data were obtained from the eTOX database (Cases et al., 2014) (Fig. 1a).The eTOX consortium has built a large database compiling information extracted from repeat-Fig.1: General data extraction procedures developed in the study a) Lowest reported toxic doses (LRTD) for hepatotoxicity endpoints were defined by toxicologist expert curation of the findings extracted from the clinical chemistry and histopathology tables of the eTOX database and assigned to one of the following categories: <10, <50, <100, and <500 mg/kg.b) Differential gene regulation resulting from exposure to chemical substances of primary human hepatocytes was compiled from the LINCS database.The information employed was exposure time, dose and top 50 upregulated and top 50 downregulated genes.c) A metabolic model for hepatocytes was used to determine upper/lower bounds for reaction fluxes through flux variability analysis.Gene-protein-reaction (GPR) associations in the model provided the information to link this information to gene expression data.
in the LINCS database in order to provide for each chemical perturbation a list of the 50 most highly up/down regulated transcriptomics probes.Each transcriptomic probe is associated with its corresponding expressed genes according to the information available in LINCS.
Each differential response in the LINCS database is identified as a gene expression signature.For instance, signature "CPC020_PC3_24H:BRD-K94173926-001-01-6:10" contains a first part that identifies the experiment (CPC020), the cell line (PC3), and the perturbation time (24 h); the second part identifies the perturbation (BRD-K94173926), which in this case corresponds to the identifier of a small compound, followed by experimental information of location on the plate.The final part of the identifier is the perturbation dose (10 μM).
Gene expression profiling signatures of the three statins shown in our example were identified as follows: pravastatin CHE-BI:63618 CPC015_PHH_24H:BRD-A71816415-236-03-5:10, simvastatin CHEBI:9150 CPC015_PHH_24H:BRD-K22134346-001-11-6:10, and rosuvastatin CHEBI:3854 CPC015_PHH_24H: BRD-K82941592-238-02-9:10) in hepatocytes (PHH) and with a dosage of 10 µM for a period of 24 h.We used the LINCS L1000 API3 to query for available signatures in the database, each one identified by its "sig_id" field, which is a character string that uniquely identifies a signature.It contains metadata that can be used to query the API.For each signature, we retrieved their associated 50 up-and downregulated RNASeq probes, identified by "up50_lm" and "dn50_lm".The signature identifier was later used to query the API using the Siginfo service in order to obtain the meta information of the signatures of interest, including fields such as "pert_dose", "pert_dose"unit", "pert_time", "pert_time_unit" in json format.The full details of the available fields are in the API documentation 3 .
After downloading the full collection of signatures, they were stored locally in an SQLite 3 database for fast access to signatures specific to one perturbation.
The information about the perturbation was retrieved through "PertInfo" service of the LINCS API.Querying the API with the perturbation identifier ("pert_id" field) produces information about the chemical compound such as its chemical structure in SMILES and InChI format, as well as its PubChem id.The perturbation is associated with genes as a set of RNASeq probes that were up-/downregulated.Such RNASeq probe identifiers can be queried using the GeneInfo service of the API, which provides information such as Entrez gene id and gene symbol (full details are given in the API documentation).
Finally, information about the cell types can be retrieved by using the CellInfo service of the API.In our case, we were interested in focusing our study on hepatocytes.According to LINCS, the cell type PHH4 corresponds to primary human hepatocytes isolated from liver supplied from CellzDirect (Thermo Fisher), and therefore the chemical perturbation information associated with PHH cells was used in the study.

Endpoint construction and compound classification
With the aim of building endpoints representing diverse types of liver injury that are amenable for modeling, the previously extracted collection of findings was clustered into four sets: clinical chemical findings associated with hepatocellular (CCHC) or hepatobiliary (CCHB) damage, and morphological (obtained from histopathology) findings associated with hepatocellular (MFHC) or hepatobiliary (MFHB) damage.The assignment of individual findings to these categories was made by toxicological expert judgment.A full list of the considered findings and the assigned categories is included as File S11 .
A compound was considered positive for any of the aforementioned endpoints if at least one of the associated findings was observed.Otherwise, the compound was considered negative since it is expected that all the findings were evaluated for every compound.
In order to obtain quantitative values suitable for modeling, we recorded the lowest dose at which the finding was observed in any study for each compound.This value can be considered to represent the lowest dose at which this effect has been reported for this compound in any of the considered studies (LRTD, lowest reported toxic dose) and describes roughly the potency of the observed toxic effect of the compound.We preferred to avoid the use of the term LOEL (lowest observed effect level) for this value since it was calculated from a mixture of studies and therefore it could not be considered formally correct in all instances.In general, the lower the LRTD, the more potent the compound.Furthermore, to avoid the bias of too low or too high LRTD values in the analysis, we assigned LRTDs to one of the following categories: <10, <50, <100, and <500 mg/kg.
The additional term "preclinical" in the endpoint description is chosen since the same scheme can be applied for speciesspecific endpoints (only rat, only rodents, only dogs, etc.) and for human data.Endpoints can be named accordingly: Human Hepatotoxicity (H-HT), Mouse Hepatotoxicity (M-HT), etc.

Querying the LINCS API for differential gene expression data
We downloaded from the LINCS database differential expression data (known as gene expression signatures) observed in cell lines and primary cell types under chemical perturbations, i.e., gene expression profiling assays of cell lines and primary cell types exposed to compounds.In total, 22119 perturbations were available for compounds with gene expression information based on the L1000 technology (Duan et al., 2014), which is a high throughput method that allows genome-wide estimation of mRNA expression by measuring about 1000 genes in each experiment.The L1000 SOP is now publicly available 2 .The experimental information has been statistically processed These previous relationships can be interpreted as follows: when two genes are associated by an AND operation according to the GPR rule, then downregulation can be caused by any of the two genes (min function).In the AND case, it suffices for one of the genes to be downregulated to potentially alter the reaction; if one of the genes is overexpressed, we cannot assume overexpression as the reaction depends on both genes.In turn, when the genes are associated by an OR rule, then only one of the genes is needed in order to have upregulation (max function).In this case, the downregulation of one of the genes would not be a sufficient condition for expecting negative regulation of the reaction.By applying these criteria systematically as shown in the previous equation to each gene involved in a reaction flux, we obtained the overall effect of the gene regulation δ ijk on that reaction under the given chemical perturbation s i .
In this study, the previous relationships were applied to every reaction in the models to map the regulation information from the 978 genes with hepatotoxicity information from 77 chemical compounds into 540 metabolic reactions that were present in tissue-specific human cell models.The resulting information was stored in an SQLite 3 database for easy retrieval.

Flux variability analysis
We performed a simulation of the steady state of the cell model in order to determine the distribution of fluxes for each reaction.The procedure described in the previous section provides an estimate of the main perturbed reactions in the cell.By performing a systems analysis of the model, we can determine how critical such perturbations may be in terms of disrupting the steady state equilibrium.For that purpose, we used a sensitivity analysis method known as flux variability analysis (FVA), which provides an estimate of the upper and lower bounds of each steady-state reaction flux v i in the cell (Orth et al., 2010).We used for the calculations the COBRApy package (Ebrahim et al., 2013).The method is based on determining the solution, i.e., the distribution of fluxes in the cell that correspond to the optimal state of the cell in the sense of some objective function.This objective function b in Recon 2 was obtained for each cell type through a fitting of growth experimental data with a weighted linear combination of main essential fluxes in the cell (Thiele et al., 2013).In our study, we determined the space of solution fluxes v i that maximize the objective function defined for each cell in Recon 2 by solving the following constraint-based optimization problem: where S is the stoichiometry matrix, where each metabolite has a row and each reaction a column, and stoichiometric coefficients are mapped into each column; α i and β i are the nominal lower and upper bounds for each flux v i , and c T is a vector of coefficients corresponding to the biomass function in the cell model.Once the optimal biomass b 0 has been calculated, in order to determine allowed upper and lower bounds for each flux, v i lb , v i ub , respectively for each reaction flux v i , an optimization

Differential gene expression data extracted from the signature
As shown in Figure 1b, a LINCS signature consists of the 50 most highly differentially expressed up-/downregulated genes in the cell type under this chemical perturbation.In a similar manner, for each chemical perturbation s i and each cell type c j , we can determine the sets of upregulated g + (s i ,c j ) and downregulated genes g -(s i ,c j ).In total, transcription regulation information from LINCS due to chemical perturbation was found for 978 genes in 66 different cell types.We compiled this information in an SQLite 3 database.
In the case of primary hepatocytes (PHH) as the cell type, we found that gene expression information was available for 1845 compounds (File S2 1 ).We matched these compounds against compounds in eTOX based on their chemical structure by using their InChI representation.Potential InChi mismatches due to differences between compounds starting from the hydrogen sub-layer were manually inspected and the ones corresponding to identical substances were kept.In total, we were able by these means to match 161 chemical compounds with information from eTOX.We were able to retrieve hepatotoxicity information from eTOX for 77 of these 161 compounds.These 77 compounds formed the dataset that was used in the subsequent studies.

Cell type-specific metabolic model
A consensus metabolic model for human cell metabolism Recon 2, MODEL1109130000 (Thiele et al., 2013) was downloaded from the Biomodels database (Juty et al., 2015).The model is represented using SBML (Systems Biology Markup Language, Hucka et al., 2003) and contains 65 cell-type specific models, including liver hepatocytes.In the models, each of the 7440 metabolic reactions are annotated with their associated reactants and corresponding stoichiometric coefficients.Furthermore, reactions are annotated with pathway information, enzyme class and gene-protein-reaction (GPR) associations (Files S3 1 , S4 1 ).GPRs are Boolean functions that reflect the requirements in terms of genes for their associated enzymes and reactions f j (g i ), including those requirements for enzyme complexes that are formed by different subunits, the presence of isozymes, enzyme promiscuity, etc. (Thiele and Palsson, 2010).
We used this approach in order to determine the effect of a chemical perturbation from the LINCS database in the reactions contained in the model for each type of cell.More precisely, LINCS provides for a set of cell types c j the up-/downregulated gene sets g + (s i ,c j ) and g -(s i ,c j ) for several chemical perturbations s i .The perturbation on each reaction is determined by the GPR AND/OR Boolean rules defined in the model between genes and reactions.We mapped this information by translating AND/OR into min/max relationships as follows: (1) (2) The level of perturbation shown in previous Equation 4 estimates the maximum change that can take place in a given reaction k under perturbation without affecting the cell steady state equilibrium.If a positive perturbation took place (upregulation), we take the upper bound of the flux; if the perturbation was negative (downregulation), we use the lower bound.In that way, we get an estimate of the maximum perturbation that can take place without affecting the cell equilibrium state.We should expect that more toxic perturbations should be associated with reactions having lower elasticities and therefore with associated perturbation having a higher impact on the cell's state.

Chemical similarity
Chemical similarity between a pair of chemical structures was obtained by computing their Tanimoto similarity coefficient through Open Babel (O' Boyle et al., 2011).The molecular fingerprints FP2 for each chemical were computed, which generates an index vector for small molecule fragments based on linear segments of up to 7 atoms (see OpenBabel documentation (O' Boyle et al., 2011)).Similarity was computed by applying the Tanimoto similarity to the fingerprints, which computes the fraction of shared fragments divided by the total number of fragments.A value of 1 corresponds to perfect similarity while a value of 0 to total dissimilarity (no common fragments).Chemical structures of compounds in the LINCS database are given in InChI format in File S6 1 .

Code availability
In File S7 1 we provide a Jupyter Notebook with the Python scripts for computing network perturbation elasticity as shown in the example for statins.

Computation of hepatotoxicity endpoints
We collected toxicity endpoint data entries from a curated version of the eTOX database.We focused on hepatotoxicity data, as it was the class containing more abundant information.In order to improve data integrity and to avoid inconsistencies, we tried to improve the quality of the data by using standardized terms that were curated based on toxicology expert judgment.
To assess the quality of the extracted refined hepatotoxicity values, we tested the hypothesis that for structurally similar compounds in the dataset we might expect, in general, similar toxicity values.This test is independent of the way hepatotoxicity values were extracted and therefore would provide an initial validation of our data.
The test was performed by looking at the chemical fingerprint-based Tanimoto similarity between compounds in the data set.In this test, we created subsets of chemically similar compounds by varying the similarity threshold and looked systematically at the pairwise correlation of similarity coefficients and toxicity as given by the LRTD (lowest reported toxic dose) values in eTOX.Interestingly, we observed that pairs of compounds with high Tanimoto similarities (above 0.75) showed a high correlation between chemical similarity and LRTD to maximize/minimize the flux is calculated under the additional constraint of setting the biomass to its optimal b 0 : By solving the given equations, we obtained on each flux the allowed levels of variability in the model that will not alter the cell's objectives (File S5 1 ).As show in Figure 1c, we may refer to these bounds as flux elasticities.
Solving constraint-based metabolic models such as the one shown in Equation 2 therefore requires boundaries for each reaction as well as a biomass reaction used as objective function.
In the case of Recon 2, which is the model used in this study, tissue-specific upper and lower bounds were determined for each reaction based on tissue-specific gene and protein expression data.The algorithm that allows such determination of the boundaries for each reaction in the models is described by their authors (Shlomi, et al., 2008;Schellenberger et al., 2011).These bounds are expressed in mmol/gdw/h.The biomass reaction, in turn, accounts for the macromolecular synthesis requirement for proteins, DNA, RNA, lipids, and carbohydrates.How the biomass composition was determined has been described elsewhere (Thiele and Palsson, 2010).This reaction provides an overall estimate of the cell maintenance objective and serves to evaluate the state of the cell.In PHH, we assume that this biomass function represents cellular maintenance so that we can assess the effect of the response to a perturbation in the cellular equilibrium.

Estimation of the perturbation elasticity
Our central assumption is that flux elasticities, i.e., the upper and lower bounds of each reaction, reflect the resilience of this particular reaction in terms of robustness.A reaction with a narrow span between the lower and upper bound will in general correspond to a point of fragility in the network, as a perturbation applied on that reaction is more likely to alter the cell's equilibrium than in the case of reactions with higher elasticity values.Our goal, therefore, is to use flux variability information as a measure of robustness of the system in combination with differential gene expression information to identify the points of perturbation in the network to ultimately characterize cell responses under chemical perturbations.From the previous computations, we can define a coefficient in order to estimate the overall level of network elasticity perturbation of a given reaction flux k due to the chemical perturbation i in cell type j as follows: where H is the Heavyside function: (3) (4) (5) tables of the eTOX database was higher for hepatocellular than for hepatobiliary damage in both cases.

Development of systems approaches for the prediction of hepatotoxicity
Using the hepatotoxicity endpoint values estimated from LRTD values, we investigated if the prediction of hepatotoxicity could be improved by combining structural models with gene expression data and metabolic models using the LINCS database (Duan et al., 2014) as data source.Among the data available at LINCS we focused on the L1000 collection, which provides signatures consisting of the 50 most highly upregulated genes and 50 most downregulated genes identified for each chemical compound perturbation.In some cases, signature data were available for different concentrations of the compound for a given cell type, although in general signature data were only available for a single concentration of the compound.
Figure 4 shows a heatmap of the 50 genes that are most highly upregulated or most strongly downregulated by the three statins (detailed in File S8 1 ).As expected, the three compounds cause differential expression of a partially overlapping set of genes (50 genes were differentially expressed in the response of at least two compounds).In turn, each compound induced differential expression of a significant number of unique genes.These (r > 0.8), while the correlation for hepatotoxicity endpoints was lower when similarity was under a threshold of around 0.75 (Fig. 2a).
This decrease in correlation was significantly weaker in the case of hepatotoxicity endpoints considered specifically for the class of morphological findings or for clinical chemistry.Here, LRTD values were still highly correlated for Tanimoto similarities above 0.5.
When the endpoint values were separated even further into hepatobiliary or hepatocellular injury, the trend was still stronger.We should specially note the case of morphological findings related to hepatobiliary injury, which remained highly correlated down to a remarkably low similarity of 0.3.
Figure 2b shows scatter plots of hepatotoxicity LRTD values between pairs of similar compounds for different similarity thresholds.Toxicity values appear correlated for highly similar compounds (above 0.75).Below this similarity threshold, toxicity values appear less correlated and in some case do not appear to be related at all.LRTD values were assigned to the following categories: <10, <50, <100, and <500 mg/kg.Figure 3 shows the percentage of positives observed for each category in all studies and those that were observed as compound-related.The percentage of positives extracted from "clinical chemical" and "histopathological" a sensitivity analysis of cell growth for each gene.The method consisted of applying flux variability analysis (FVA) in order to determine upper and lower bounds that each reaction flux can attain without compromising cell growth.If a reaction flux can only vary within narrow bounds without affecting growth, we can assume that a chemical perturbation response involving this reaction may indicate a symptom of cytotoxicity.Such effects can be quantified by looking at the way that reaction fluxes are associated with genes.The cell models in Recon 2 incorporate associations between genes and reactions as gene-protein-reactions (GPR) (Thiele and Palsson, 2010), which are Boolean rules that tell us about possible gene expression states that will allow the reaction to be catalyzed by the associated enzymes.large differences between the induced gene sets would make it difficult to compare the toxic effect and to elucidate the statins' mechanism of toxicity.Our assumption followed in this study is that toxicity responses can be better analyzed through a systems biology approach by looking at perturbed pathways rather than at individual genes, see below.
We found 77 substances that had hepatotoxicity information available in the eTOX database and gene expression information in LINCS.In order to analyze the effect of the perturbation in gene expression of cells exposed to each chemical, we considered the metabolic models of cells available in Recon 2 (Thiele et al., 2013).These models link genes and metabolic pathways and can be used to simulate the effect of perturbations.We applied overall estimate of the perturbation.The estimate combines the information on differential gene regulation with network information provided by FVA.
File S9 1 shows the upper and lower bounds obtained by FVA for each perturbed reaction in our example for statins.The corresponding perturbation elasticity values are shown in Figure 4, and Table 2 compares them with their chemical similarity values.Pravastatin and simvastatin are two highly similar statins (similarity of 0.89), while rosuvastatin has a chemical similarity with the other two compounds that is below 0.30.These levels of similarity correspond with the estimated perturbation effect.Pravastatin and simvastatin showed similar perturbation levels, while levels were considerably higher in the case of rosuvastatin.
We systematically applied the described calculation of perturbation elasticity values to all 77 compounds using the hepatocyte model in Recon 2 (Fig. 5a).First, we determined LRTD doses for each histopathological finding and assigned the compounds into toxicity classes.These values were compared with the aggregated regulation levels associated with the perturbation response of the cell.Our hypothesis is that there should be a relationship between the level of toxic effects and the elasticity of perturbations, i.e., how constrained the perturbed fluxes are in the cell.
For instance, for our example of statins, we can search for enzymatic reactions in the hepatocyte model that are potentially perturbed because of the differential expression of genes for an enzyme that catalyzes a reaction.One of these is gene AKR7A2, from the aldo-keto reductase (AKR) superfamily.The gene products of this superfamily of genes are involved in drug metabolism and detoxification (Barski et al., 2008).According to the gene differential expression information from the LINCS L1000 database, AKR7A2 is significantly overexpressed in response to rosuvastatin, but not to pravastatin or simvastatin.The gene encodes an enzyme of broad substrate specificity.Several reactions in the hepatocyte model contain GPR rules involving AKR7A2, as shown in Table 1.By applying the same procedure to the full list of GPRs in the hepatocyte model, we can determine the list of reactions in the metabolic network whose flux may be altered in response to the compound.The list of reactions that appear perturbed in response to pravastatin, simvastatin and rosuvastatin based on the differential expression of genes in the model is shown in File S8 1 .
We used flux variability analysis (FVA) integrated with the up-/downregulation provided by LINCS to determine the level of network perturbation observed in response to each of the chemical compounds.To that end, we computed the network elasticity parameter defined in Equation 4, which provides an b) Receiver operating characteristic (ROC) curve for hepatotoxicity endpoints that assesses discriminant power of network perturbation elasticity parameter.The calculated AUROC was 0.43.c) ROC curve for compound-related hepatotoxicity endpoints that assess discriminant power of network perturbation elasticity parameter for this endpoint.The calculated AUROC was 0.89 Tab.4: Area under the ROC curve (AUROC) obtained for different hepatotoxicity endpoints at 50, 100 and 500 mg/kg Gene expression regulation in primary hepatocytes was used as discriminant.The number of available data points is given in the first column.Hepatotoxicity data from all available assays were considered for the calculation of the AUROC values shown in this three substances.Notably, the number of reactions altered by rosuvastatin is 77, while for pravastatin and simvastatin only 22 and 27 reactions are altered, respectively.In fact, several pathways, such as those for pyruvate metabolism and nucleotide interconversion, appear perturbed only in the case of rosuvastatin.This could be interpreted as a higher perturbation of the cell state in the case of rosuvastatin compared with the other statins.
Figure 6 provides a visualization of the altered reactions and pathways in glycolysis, the citric acid (TCA) cycle and pentose phosphate pathways.Interestingly, we observe in the case of rosuvastatin a route that is clearly differentially altered, i.e. involving aldose reductases.The fact that multiple reactions in the same pathway are altered may indicate a higher perturbation of the cell equilibrium by the compound.
Similarly, looking at perturbed pathways in human liver cells for the compounds in eTOX, we observed the perturbations of pathways associated with liver toxicity.The most predominant pathways were those associated with downregulation of mitochondrial beta-oxidation of fatty acids, which were found in 87% of the cases for critical reactions with low level regulation.Drug-induced hepatotoxicity is often caused by inhibition of fatty acid oxidation (Russmann et al., 2009) and our model notably detected this mechanism.We also observed perturbations in pathways related to vitamin C and B12 deficiency, amino acid metabolism or depletion of ATP by glycolytic inhibition.
This assumption was tested by assessing the ability of perturbation elasticity values to act as a discriminant parameter for hepatotoxicity levels.A parameter that allows testing such an assumption is the area under the ROC (receiver operating characteristic) curve AUROC.We calculated the AUROC for each endpoint (LRTDs <50, <100 and <500 mg/kg) using perturbation elasticity as a discriminant parameter (Fig. 5b,c).Results are shown in Table 3 for compound-related cases and in Table 4 considering all assays.Notably, performance was significantly higher for compound-related cases in comparison with those from all assays (AUROC of 0.89 for compound-related cases and 0.66 for all cases of hepatotoxicity endpoints).

Determination of pathways associated with drug-induced hepatotoxicity
Another useful result that can be obtained from the gene regulation response to chemicals is the determination of the main metabolic pathways that are altered by the chemical perturbation.We may expect that such information should bring some insight into the mechanisms of toxicity.
For the example of the statins, Table 5 shows the list of pathways that were altered in each case and the number of reactions in each pathway.As may be expected for cholesterol-lowering drugs, differential expression for reactions involved in cholesterol and fatty acid metabolism pathways is enriched for the Fig. 6: Perturbed reactions and metabolic routes in the glycolysis, citric acid (TCA) cycle and pentose phosphate pathways in hepatocytes in response to pravastatin, simvastatin and rosuvastatin Red and blue arrows represent down/upregulation respectively according to LINCS.Metabolic maps were generated using the pathway visualization tool Escher (King et al., 2015).
to perform a sensitivity analysis of the fluxes.In the Recon 2 model genes are associated with reactions through rules, and we can perform a simulation of the model in order to determine the upper and lower bounds of the reactions that can be altered without affecting the equilibrium of the cell.Such information, along with pathway information of the reactions, can help us to assess the toxicity response of the cell in response to chemical compounds.
One major requirement of the study was the availability of high quality toxicity data.We used the information contained in the eTOX database as toxicity data source.The eTOX database provides a wealth of information on chemical toxicity assays.However, extracting relevant data from this legacy database to develop organ-specific models of toxicity has proven challenging.Recent efforts within the eTOX consortium have facilitated analyzing these data, leading to a curated version of the database that provides ontology-based annotated histopathology.The advantage of this is that collected toxicity data can

Discussion
In the present study, we have developed a protocol to predict organ-specific toxicity from gene regulation responses in cells.We focused our study on chemical-induced hepatotoxicity, which is one of the most important adverse effects to consider in drug development.Furthermore, histopathology data is more abundant for liver than for other organs.
Our first step consisted of estimating the hepatotoxicity endpoint for chemical compounds from assay data in the eTOX database and we implemented a validation protocol for that purpose.We then compared these values with adverse effects related to chemical compounds observed in hepatocytes based on gene expression data.Differential gene expression data of hepatocytes exposed to the chemical compounds were extracted from the LINCS database.To estimate the effect of the chemical perturbation in the cell, gene expression data were mapped into Recon 2, a genome-wide metabolic model of hepatocytes, Valine, leucine, and isoleucine metabolism 1 0 0 limitation is the fact that we are using a model of the cell that focuses on the metabolic network, which implies that regulatory networks are not explicitly considered.Despite these limitations, the information that we processed proved to be insightful, as we found that on average ~40 metabolic reactions (with cases in excess of 100 reactions) were differentially activated in primary human hepatocytes under chemical perturbations.Our approach thus provides the means for building models based on gene expression data of cells, allowing the prediction of specific pathway perturbation from chemicals.

Conclusions
In this study, we developed a bioinformatics approach to model drug toxicity focused on hepatotoxicity.This novel approach is based on the systems-wide analysis of gene expression data.
Modeling approaches based solely on the structural features of chemical substances can provide initial information about a chemical's toxicity, as shown by the observed correlation between highly similar chemical compounds and their curated lowest reported toxic dose causing toxic effects in hepatocytes.
Here, our goal was to investigate if looking at systems-wide cell responses could be a more general approach for modeling chemical toxicity.Our promising results, obtained by linking gene regulation responses to chemical substances through metabolic models of primary human hepatocytes with their associated measured hepatotoxicity endpoints, showed the suitability of the method to infer organ-specific toxicity responses from gene expression assays.
Our findings open new possibilities in the field of modeling and drug safety testing, as they can be used to develop a modeling strategy.This strategy would use the gene expression response to chemicals in desired cell types in order to develop organ-specific predictive models for chemical toxicity.These new in silico predictive models could help to better identify potential hepatotoxicity and to develop safer drugs.The models could be used at early stages in drug research to optimize and prioritize the chemical lead series, which should consequently help to lower attrition rates later in the drug development process.
Furthermore, the results can be used to elucidate potential mechanisms of toxicity, which can also be used for optimization as well as for risk assessment of observed toxicological effects.As we can perform the study for different types of cells, including cancer cells, the method opens opportunities for designing drugs that target a specific type of cancer, while reducing their toxic effects in other types of cells.Furthermore, the link with pathway information provides the possibility of establishing hypotheses about the mechanisms of action of the toxic compound at the systems level.
The study here was focused on hepatotoxicity, although as more data becomes available for other types of cells, this methodology could be applied to the estimation of toxicity effects in other types of organs, leading ultimately to a full protocol for estimating organ-specific toxicity responses to chemical substances by systems-wide analysis of gene expression data in human cells.be grouped at the desired ontology level, providing a way to build good quality data sets for modeling.We focused our study on hepatotoxicity values that were extracted from eTOX by using standardized terms curated by toxicology expert judgment.Analysis of this high-quality data showed that in general similar chemical structures have similar hepatotoxicity profiles.This result provided evidence of the quality of the toxicity data, a prerequisite for us to investigate whether the prediction of hepatotoxicity could be further improved by combining structural models with systems information.
We tested the proposed systems approach by using omics data from gene expression in cells, although other approaches like proteomics or metabolomics data also could be envisaged following the same type of strategy.Gene expression data provides a systems-level snapshot of the pathways that have been perturbed in the cell in response to the chemical.We applied a systems biology modeling approach to try to understand the severity of a toxic response, not by looking at the dose response curve but rather by assessing how critical for the cell's survival the pathways that are activated by the toxic response are.The main assumption is that the observed changes in metabolic activity are the result of the cell's attempt to restore homeostasis following toxicity arising via various mechanisms, such as direct chemical reactivity, perturbation of critical signaling pathways or even direct effects on the expression of genes that code for proteins which are involved in metabolism.Changes in metabolic pathways may therefore be a symptom of, rather than a direct cause of cytotoxicity.This does not preclude the use of metabolic pathway signatures to classify toxicants or identify points of departure, however interpretation of the causality must be carefully considered as it has potential consequences on the use of particular pathway modulations to prospectively predict outcomes.Results were consistent with this assumption, as we observed that the level of regulation associated with a chemical could be used in order to build predictive models of hepatotoxicity.Furthermore, our systems-wide model provided hints about the pathways involved in the toxicity effect, with perturbed pathways including vitamin C and B12 deficiency, amino acid metabolism or depletion of ATP by glycolytic inhibition.
Our approach differs from other previous approaches in the way that we integrated transcriptomics data into the model.Rather than trying to identify biomarkers, our goal was to estimate the overall effects of the chemical perturbation on the cell.To that end, we used gene regulation data to perform a systems-wide simulation of the metabolism of the cell.Our approach therefore can take into account multiple toxicity mechanisms.
There are, however, several limitations of the approach that need to be taken into account.First, the extent of data that could be matched between toxicity assays and gene expression data was limited.Secondly, the integration of in vitro concentrations of chemicals from LINCS with related in vivo doses seems difficult, as the target organ tissue concentration is not known.Our assessment therefore was focused on testing the discriminant ability of a systems-level estimate of the overall effect of the toxic response in the cell rather than searching for a quantitative comparison between in vitro and in vivo data.Another major

Fig. 2 :
Fig. 2: Correlation between pairs of compounds for hepatotoxicity endpoints as a function of chemical similarity a) Correlation between pairs of similar compounds as a function of the similarity threshold for hepatotoxicity endpoints (HT) quantified as LRTD, representing the lowest dose at which they were positive in any study and grouped by controlled terms: morphological findings (MF); clinical chemistry (CC); hepatobiliary injury (HB); and hepatocellular injury (HC).b) Exemplar scatter plots of toxicity values between pairs of compounds for different Tanimoto similarity thresholds (0.4 to 0.9 by 0.1)

Fig. 4 :
Fig. 4: Stepwise description of the process for the calculation of flux perturbations for the statins example a) Heatmap comparing the top 50 genes upregulated (dark red) or downregulated (light red) in the presence of pravastatin (A), simvastatin (B) or rosuvastatin (C) according to the LINCS database.b) Detail of the map is shown for the example gene AKR7A2, from the aldo-keto reductase (AKR) superfamily, which appears upregulated for rosuvastatin only.The full list of genes is given in File S8 1 .c) GPR associations from the hepatocyte model are used in order to determine the list of reactions that are associated with gene AKR7A2 (grey circles).d) This information is similarly mapped for the rest of the genes in the inset a) to generate a second heatmap for the reactions.Dark blue means upregulated reactions and light blue are downregulated reactions.e) The regulation of reactions associated with the example gene AKR7A2 is shown.f) Fluxes and biomass function in the hepatocyte model are used to determine the upper and lower bounds (UB and LB, respectively) of the reactions through flux variability analysis (FVA).FVA for reactions associated with AKR7A2 are shown.g) This information is then translated into additional columns in the reaction heatmap.Finally, both values about gene regulation and reaction sensitivity are integrated by means of the elasticity parameter, which aggregates the limit values LB or UB for each reaction depending on the sign of the perturbation.h) Finally, the aggregated values are shown on a bar plot for the three statin example compounds.

Fig
Fig. 5: Observed relationships between hepatotoxicity endpoints and network elasticity perturbations a) LRTD values generated from eTOX compared with estimated network elasticity perturbation according to gene regulation data.Compound-related points are shown in red.Thresholds defining LRTD values below 50, 100 and 500 mg/kg endpoints are shown as vertical lines.b) Receiver operating characteristic (ROC) curve for hepatotoxicity endpoints that assesses discriminant power of network perturbation elasticity parameter.The calculated AUROC was 0.43.c) ROC curve for compound-related hepatotoxicity endpoints that assess discriminant power of network perturbation elasticity parameter for this endpoint.The calculated AUROC was 0.89

Tab. 5 :
Pathways perturbed in hepatocytes by pravastatin, simvastatin and rosuvastatin and number of reactions in the

Area under the ROC curve (AUROC) obtained for compound-related hepatotoxicity endpoints at 50, 100 and 500 mg/kg
table.Gene expression regulation in primary hepatocytes was used as discriminant.The number of available data points is given in the first column.Only findings labeled as "compound-related" according to toxicologist expert judgment in the original reports were considered for the calculation of the AUROC values shown in this table."Compound-related" means here that the observed effect was considered to be caused by the compound and not by chance or external factors.

: Reactions in the model containing AKR7A2 in the Gene-Protein-Reaction rules
Reaction IDs are those from Recon 2