The potential to use QSAR to populate ecotoxicity characterisation factors for simplified LCIA and chemical prioritisation

Today’s chemical society use and emit an enormous number of different, potentially ecotoxic, chemicals to the environment. The vast majority of substances do not have characterisation factors describing their ecotoxicity potential. A first stage, high throughput, screening tool is needed for prioritisation of which substances need further measures. USEtox characterisation factors were calculated in this work based on data generated by quantitative structure-activity relationship (QSAR) models to expand substance coverage where characterisation factors were missing. Existing QSAR models for physico-chemical data and ecotoxicity were used, and to further fill data gaps, an algae QSAR model was developed. The existing USEtox characterisation factors were used as reference to evaluate the impact from the use of QSARs to generate input data to USEtox, with focus on ecotoxicity data. An inventory of chemicals that make up the Swedish societal stock of plastic additives, and their associated predicted emissions, was used as a case study to rank chemicals according to their ecotoxicity potential. For the 210 chemicals in the inventory, only 41 had characterisation factors in the USEtox database. With the use of QSAR generated substance data, an additional 89 characterisation factors could be calculated, substantially improving substance coverage in the ranking. The choice of QSAR model was shown to be important for the reliability of the results, but also with the best correlated model results, the discrepancies between characterisation factors based on estimated data and experimental data were very large. The use of QSAR estimated data as basis for calculation of characterisation factors, and the further use of those factors for ranking based on ecotoxicity potential, was assessed as a feasible way to gather substance data for large datasets. However, further research and development of the guidance on how to make use of estimated data is needed to achieve improvement of the accuracy of the results.


Introduction
Every day a wide variety of chemicals are emitted into the environment from a multitude of sources. These emissions and the subsequent pollution of the natural environment and potential exposure of living organisms and humans may pose a risk to the ecosystem and human health (UNEP 2012). To efficiently reduce this risk by implementing reduction measures or substitution, it is necessary to identify chemical emissions of concern (e.g. Egeghy et al. 2011;von der Ohe et al. 2011). The reason for concern should be based on the potential negative effect of the chemicals rather than only emitted amount. Chemical risk assessment is one way to obtain Responsible editor: Ralph K. Rosenbaum Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11367-018-1452-x) contains supplementary material, which is available to authorized users. the relevant information (e.g. van Leeuwen and Vermeire 2007), but that is generally a very timeconsuming process. There is a need for a fast and easy-to-use screening tool, based on (eco)toxicity potential but not necessarily a full risk assessment, to be able to do a first prioritisation for large datasets.
The USEtox model can be used to integrate data on chemical fate and effect into measures of potential impact and should thus have a good potential to be used for prioritisation of chemicals. USEtox is a scientifically based, consensus model to characterise potential impacts on human health and freshwater aquatic ecosystems from a product (or service) life cycle Rosenbaum et al. 2008). With the USEtox model, a large number of characterisation factors (CFs) have been published but the model can also be used to calculate new CFs. To run the USEtox model is a fast process, but the collection of necessary substance data, in particular the (eco)toxicological data, can be time-consuming, especially if the chemical inventory is large.
The use of (quantitative) structure-activity relationships (QSAR) models for the estimation of the properties of chemicals for which data are needed can be one way to speed up the screening process, especially for cases where experimental data are lacking. A QSAR model is a relation between chemical structure and a property of the chemical compound (ECHA 2016;OECD 2007b). The features of a chemical structure are captured by a set of chemical descriptors that are used to predict characteristics of the chemical. In comparison to other data collection methods, QSARs are fast and have the potential to cover many substances. The substance coverage is dependent of the model domain and, in contrast to, e.g. Regulation (EC) No 1907/ 2006 of the European Parliament and of the Council on the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) registration data, independent of the substance manufacturing volumes. In addition, estimated data, e.g. by QSARs, are likely to increase in importance as authorities such as the European Chemicals Agency promote a reduced use of animal testing (ECHA 2017).
The aim of the present study was to test if the use of substance data predicted by QSARs can be one way to use the USEtox model as part of a fast and easy screening tool, with broad coverage, for ranking within large datasets based on ecotoxicity potential. For this purpose, an emission inventory for a wide range of plastic additives, such as pigments, flame retardants, stabilisers and plasticisers, was used as a case study. The focus of the present study was on ecosystem effects, and thus, toxicity data on human health were not collected or predicted.

The USEtox model
With the USEtox model, (eco)toxicity CFs integrate the fate, exposure and effects of a chemical after emission into the environment. Emission compartment specific CFs are calculated from the product of matrices containing fate factors (FF), freshwater ecosystem exposure factors (XF) and freshwater aquatic ecosystem toxicity effect factors (EF) (Eq. (1)). The CFs provide the means to convert the chemical emissions into impact scores (IS) and thus compare chemical emissions based on (eco)toxicity potential.
Each of the factors (FF, XF and EF) is calculated with the use of substance specific data on physico-chemical properties as well as (eco)toxicological effects. The obligatory parameters for organic substances, necessary to provide the model with, are listed in Table 1. Remaining parameters can be estimated with model internal estimation routines, as long as the substance is within the model domain.
USEtox CFs are defined as recommended if the EF is based on data from three trophic levels and indicative if the ecotoxicity data cover less than three different trophic levels (Huijbregts et al. 2015b). The standard ecotoxicological test set for aquatic organisms covers primary producers, invertebrates and fish, often as algae, the water flea Daphnia magna and fish such as rainbow trout or fathead minnow.
To calculate CFs in this study, the USEtox model version 2.02 was applied. The focus of the present study was on ecosystem effects and thus only ecotoxicity CFs were calculated. The model user manual (Huijbregts et al. 2015a) and the manual for the organic substance database (Huijbregts et al. 2015b) were used as basis for the workflow and data  Table 2 collection. The USEtox manuals (Huijbregts et al. 2015a;Huijbregts et al. 2015b) recommend the use of experimental data but provide guidance also to the use of estimated data for the physico-chemical data collection. Indeed, many of the existing USEtox CFs have fate factors based on estimated data. The USEtox manuals do however not give any further guidance on the estimation of ecotoxicological effects, and thus, the identification of available models, suitable for the purpose, was part of the present study.

Chemical inventory
The research programme ChEmiTecs has developed a simple method for an initial approximation of emissions of organic chemicals from products containing plastic materials (Bilitewski et al. 2012). The method has been used to estimate annual emissions of organic chemicals from the accumulated stock of products containing plastic materials in the Swedish society. The output from this modelling approach was rough emission estimates for 210 specified organic substances, and those were used as our case study inventory. More recent estimates based on a rough extrapolation from a much smaller sample of parallel calculations with a more sophisticated model indicate that the emissions in the first inventory were overestimated by, on average, a factor of in the order of 100 but with very large individual variations among substances and materials (Palm Cousins et al. 2018;Rydberg and Lexén 2016). In the study presented here, however, the original plastic additives inventory serves well as a case study to test the usability of QSAR-based CFs (the inventory is included in the Electronic Supplementary Material). The absolute magnitude of emissions is not the key feature of this paper but that these plastic additives have varying physico-chemical and ecotoxicological properties, which is why it is of interest not only to assess the emission loads but also the chemicals' potential to exert negative impacts on the environment (and humans though that was not part of the present study). For the majority of these substances, 169 of 210, USEtox characterisation factors (CFs) were not available.

Estimation of substance data with QSARs
The 210 substances in the plastic additive inventory were each identified by name, chemical abstract service (CAS) number and simplified molecular-input line-entry system (SMILES) codes. QSAR models were run, aiming to generate the relevant physico-chemical properties as well as ecotoxicological data for CF calculation with USEtox for all 210 substances. USEtox manuals give detailed guidance on how to obtain physico-chemical data by estimation methods if experimental data are lacking. Therefore, focus on model selection evaluation was placed on the ecotoxicity data, for which detailed guidance is lacking for generation of estimated data.
Existing QSAR models were used and selected based on the relevance for the purpose, i.e. whether the generated data were relevant as input data to USEtox. The existing QSAR models have already been validated, and model validation was not part of the scope of the present study. It should be noted that a QSAR model is only valid in its applicability domain, and any model used should specify if the prediction of a chemical property is in or out of the domain. The SMILES codes were used as input to the QSAR models.

Physico-chemical properties
Physical chemical properties were predicted with the U.S. Environmental Protection Agency Estimation Program Interface (EPI) Suite, version 4.11 (US EPA 2012). The EPI Suite model is recommended for use in the USEtox manuals and the guidelines therein were followed. Substance data for CF calculation were only collected or estimated for the parameters listed as necessary in the USEtox manual (Huijbregts et al. 2015a), see Table 1. The application domain of EPI Suite is organic chemicals, and inorganic as well as organometallic chemicals are outside the model domain. The plastic additives chemical inventory contained some organometallic substances which were thus outside the model domain, and physico-chemical properties obtained were not assessed as valid, but not removed from the dataset. Whether the EPI Suite training sets contain substances similar to the plastic additives in the case study inventory was not controlled as it was considered to be out of scope for the fast screening aimed for here. Although the USEtox manual states that preference should be given to experimental values only estimated values were used. The reason was that differences between estimated and experimental values for the water solubility were very large for some substances, and we reasoned that using only the estimated values (for all parameters), we would achieve a better comparability between results within the dataset. These differences can be due to measurement errors or variability in experimental data or predictive errors in the QSAR estimates. The SPARC software is recommended in the USEtox manual for prediction of acid and base dissociation constants (pKa/pKb) (Huijbregts et al. 2015b), but this is a commercial software available at a fee. Acid and base dissociation constants were not estimated, as we are here focussing on the ecotoxicological data, and all substances were entered into the USEtox model as neutral substances, with pKa.gain 0 and pKa.loss 14. The ignorance of dissociation properties adds uncertainty to the results, especially for anionic substances (Rybacka and Andersson 2016). This is however in line with previous versions of USEtox where dissociating substances were not modelled differently to neutral substances but flagged as interim. The physicochemical data used to calculate the CFs are included in the Electronic Supplementary Material.

Aquatic ecotoxicity data
There are several guidelines available for the use of QSAR. These guidelines describe how to use and report QSAR. They all are in various aspects based on the OECD principles of QSAR. These principles were agreed on in 2004 and published in the Guidance Document on the Validation of (Quantitative) Structure Activity Relationship [(Q)SAR] Models (OECD 2007a). The OECD principles states: to facilitate the consideration of a QSAR model for regulatory purposes, the model should be associated with the following information:

A defined endpoint
2. An unambiguous algorithm 3. A defined domain of applicability 4. Appropriate measures of goodness-of-fit, robustness and predictivity 5. A mechanistic interpretation, if possible The criteria listed above were applied when choosing QSAR models to be used for calculation of the aquatic ecotoxicity.
Since EPI Suite is recommended for the estimation of physico-chemical data, its Ecological Structure Activity Relationships (ECOSAR) Predictive Model was a natural starting point to generate aquatic ecotoxicity data. ECOSAR model v.1.11, included in the EPI Suite package, was applied for the purpose. The ECOSAR model domain cover organic chemicals and inorganic chemicals, organometallic chemicals and polymers are out of domain. The ECOSAR model domain has further restrictions based on the limitations of the training set (not investigated further herein), and the model gives results also for substances that are not within the model domain, thus potentially introducing large uncertainties. ECOSAR was run for the full dataset, including the organometallics out of model domain. The endpoints estimated were fish LC 50 (concentration where 50% of the population exhibit a response) 96 h, daphnia EC 50 48 h and green algae EC 50 72 h. When more than one datum was estimated for each endpoint, the lowest datum was selected for the further modelling since false positives (i.e. overestimated ecotoxicity) are preferable to false negatives (underestimated ecotoxicity) in a screening context.
Given the uncertainty introduced with ECOSAR, aquatic ecotoxicity data were also estimated with the Toxicity Estimation Software Tool (TEST), version 4.2.1, also from US Environmental Protection Agency (US EPA 2016). The endpoints estimated were fathead minnow LC 50 96 h and Daphnia magna LC 50 48 h . They were estimated by the consensus model in TEST as it has been shown that consensus models give a better estimate of the toxicity and also covers a larger applicable domain than the individual models (Zhu et al. 2008). The consensus model is the average value from the predicted values for several different methods. The result from the consensus model is only based on the methods which have predicted results in the applicability domain of the corresponding model, i.e. the uncertainty from using results for substances outside the model domain was removed.
A QSAR for algae toxicity (endpoint EC 5 0 Pseudokirchneriella subcapitata (72-96 h)) was developed to complement the TEST model with a third phylum. The QSAR model was built using partial least square (PLS) regression with molecular descriptors calculated from the Dragon 6.0 software (TALETE 2014) and experimental algal toxicity data on 80 substances (Grönholdt Palm 2014). The model has a defined applicability domain and model performance was evaluated as satisfying with respect to training (n = 48, R 2 = 0.86, RMSEE = 0.53) and testing (n = 22, Q2CV = 0.72, RMSEP = 0.97).
The above described procedure generated three aquatic ecotoxicity datasets, listed in Table 2. These datasets were used for calculation of QSAR-based CFs with USEtox. The estimated values for E/LC 50 were used for calculation of the concentration where 50% of species are exposed above their E/LC 50 (i.e. the HC 50 ), which is needed in USEtox for the calculation of the ecotoxicological effect factor (EF). Chronic HC 50 values are preferable, but in this case, only acute data were estimated. Instead, the chronic-equivalent was derived by dividing the acute values by 2 (the acute-to-chronic extrapolation factor) as described in Huijbregts et al. (2015a). The EFs are listed in the Electronic Supplementary Material.
Other ecotoxicity QSAR models and data estimation methods are available, e.g. via the OECD QSAR toolbox (OECD 2011), but not used herein. The ECOSAR QSARs are incorporated into the OECD QSAR Toolbox together with additional QSARs for fathead minnow (Pimephales promelas). The OECD QSAR Toolbox contains other functionalities for data gap filling, in addition to QSARs, but the automated workflows are for fish endpoints only (QSAR Toolbox version 4.1) and the categorisation procedure for read-across is not

Assessing the relevance of QSAR-based CFs
The reliability of the QSAR-based CFs was evaluated by the difference and correlation to the USEtox CFs based on the organic substances database of USEtox 2.02. Possible differences were further evaluated by studying correlations on the factors FF, IF and EF and by between substance comparisons. The substances in the inventory that had USEtox CFs were not organometallics or polymers and were thus within the broad model domain definitions as described above. The strength of the linear relationship between the USEtox CFs and the QSAR-based CFs (datasets 1-3; Table 2) was evaluated by comparing the squared correlation coefficients (R 2 ) from regression analysis where the QSAR-based CFs were regressed on the USEtox CFs. Pairwise comparisons of the differences between the QSAR-based CFs and the USEtox CFs were made to evaluate which QSAR-based CFs that deviated the least from the USEtox CFs. To explore the impact of the modelling choices in the parameterisation for the FF, i.e. the inclusion of only estimated data and the ignorance of dissociation constants (see Sect. 2.3.1), CFs based on physico-chemical data with preference to experimental d a t a a s r e c o m m e n d e d i n t h e U S E t o x m a n u a l (Huijbregts et al. 2015b) were calculated for the limited dataset that already had USEtox CFs (dataset 4; data listed in the Electronic Supplementary Material). For this limited dataset, to further explore the impacts of the deviations from the USEtox manuals, dissociation constants were estimated by use of Marvin Sketch 17.24 (ChemAxon 2017) and included in the input data (substances with pKa > 14 or < 0 were included as neutrals).

Calculation of impact score and ranking of chemicals
(2)) for the plastics additives were calculated according to the principles in the USEtox manual (Huijbregts et al. 2015a). The emission inventory was based on diffuse emissions from plastic products and since majority of the products will probably be in air, 90% of emissions were arbitrarily assigned to urban air and the remaining 10% of emission to continental freshwater. The IS was used to rank the plastic additive emissions according to their relative aquatic ecotoxicity potential. The plastic additives were grouped into functional categories for comparison of functional group ranking based on emitted mass and ecotoxicity potential.

Assessing the relevance of QSAR-based CFs
Of the new QSAR-based CFs, there was an overlap with the 41 USEtox CFs for 38 and 35 substances, respectively, using the ECOSAR model (dataset 1) and TEST model (dataset 2) as ecotoxicity data generator. The 35 substances in dataset 2 were also included in dataset 1, and the ECOSAR model was able to generate ecotoxicity data for three additional substances. With the addition of algae data from the algae model (dataset 3), the dataset of overlapping CFs was reduced to only 10 substances. These overlapping CF datasets were used for the further analysis to assess the relevance of the QSAR-based CFs. The data were strongly skewed and were log 10 -transformed to adjust this skewedness. No apparent outliers were noticeable in the logtransformed datasets, and all data were included in the analysis. Figure 1a shows the correlation between the ECOSAR based CFs and the USEtox CFs, which has an evident large variation; the explained variance (R 2 ) for the regression of the log-transformed data ranged from 15 to 51%, depending on emission compartment. Figure 1b shows that the TEST based CFs, based on ecotoxicity data for only fish and daphnia, were better correlated to the USEtox CFs; R 2 for the regression of the log-transformed data ranged from 78 to 86%, depending on emission compartment. The improvement in CF correlation when the TEST model was used to derive EF instead of the ECOSAR model cannot be explained by the exclusion of substances out of model domain as there was almost a complete overlap in the datasets. With the addition of algae data, the dataset was reduced to only 10 substances, which limit the possibilities for conclusions. Also for this small dataset, correlations between CFs were low; R 2 for the regression of the log 10 -transformed data ranged from 48 to 85%, depending on emission compartment. Residual errors (RE) and parameter estimates from the linear regression model of logtransformed CFs based on QSAR data (ecotoxicological data from the TEST model) by the log-transformed CFs from USEtox are listed in Table 3. The constant for the slope of the line (β 1 ) was not different from 1 (p < 0.05), and the intercept was between 0.51 and 0.81, depending on emission compartment. Since the slope does not significantly differ from 1, the untransformed relationship between the CFs based on estimated data and the USEtox CFs is equal to 10 β0 , approximately 3-6 and thus on average predicting a higher ecotoxicity potential than the USEtox CFs. The 95% confidence intervals for the regression (± 2 RE), equal multiplication, respectively division, with (10 RE ) 2 for the untransformed relationship: 300-500 CTU e , depending on emission compartment.
To further illustrate the differences in CFs based on estimated data and USEtox CFs, Fig. 2 shows boxplots for the differences between the QSAR-based CFs and USEtox CFs. The spread around zero, i.e. no deviation between USEtox CF and the CF based on estimated data, is smaller for the TEST based CFs compared to those based on ECOSAR data. Still, the deviations are large for some substances, especially for the freshwater compartment, and the median deviation for CFs based on ecotoxicological data generated by TEST was 0.1, 350 and 0.31 for emissions to urban air, continental freshwater and agricultural soil, respectively.
To try and explain the variance observed between CFs from USEtox and the QSAR-based CFs, FF, XF and EFs were compared by regression of the QSAR-based factors on the USEtox factors. There was a large variation in the FFs and fitting a line to the log-transformed data, the adjusted R 2 was 39 and 24% for the FF for freshwater and emission to urban air and freshwater, respectively. These differences apply for all the QSAR-based CFs as the same data for physico-chemical parameters were used. Data for the obligatory input parameters ( Table 1) that deviated the most from USEtox substance data were data for water solubility and chemical class classification (acid/base/amphoteric/neutral). The median difference in FF between the QSAR-based and USEtox FFs was − 0.17 and − 0.045 days for the freshwater compartment and emissions to freshwater and urban air, respectively. The XFs showed a good correlation and the adjusted R 2 was 93% for the untransformed data. Three highly lipophilic substances could be identified as possible outliers and those were predicted to be less bioavailable by the QSAR-based approach compared to the USEtox XF. The correlation between the EF based on TEST-generated data and USEtox EF was slightly better compared to the FFs; with the adjusted R 2 at 43% for the log-transformed EFs. The median difference in EF was however as large as 1295 PAF × m 3 /kg between TEST and USEtox and 16,288 PAF × m 3 /kg for ECOSAR and USEtox. Since the deviation was smaller, the QSAR-based CFs using the TEST model to estimate ecotoxicity data generated the CFs most similar to the USEtox CFs in this case. The distribution of deviations also shows that the EF was the more influential factor between the EF and the FF as the deviation was larger and the CF is directly proportional these factors.
Giving priority to experimental data, and accounting for dissociation (additional CFs calculated for a the limited dataset; see Sect. 2.3.1), improved the CFs similarity with USEtox CFs as this procedure was in line with the USEtox procedures, but the improvement was minor. Correlation analysis between the USEtox CFs and the CFs calculated with QSAR data for ecotoxicity (generated by the TEST model) and experimental data when available, but otherwise estimated data for physico-chemical properties, showed an explained variance between 80 and 85% (cf. R 2 78-85% when only estimated data were used). For this correlation analysis, one substance (CAS 6683-19-8, tetrakis methylene(3,5-di-t-butyl-4hydroxyhydrocinnamate)methan) was removed from the dataset since it was considered an outlier due to the large deviation between the estimated K OW and the experimental K OW .

New characterisation factors for plastic additives
The use of QSAR generated data improved the coverage of CFs for the plastic additives chemical inventory greatly, going from 41 to 170, or 124, depending on model selection (Table 4). The algae model developed within the present study did not cover more than 28 of the substances in the inventory but can be used together with USEtox CFs to improve coverage if CFs based on three trophic levels are needed. All CFs calculated within the present study are made available in the Electronic Supplementary Material.

Uncertainty and precision
The precision, with regard to model uncertainty, of the USEtox CFs is within a factor of 10-100 for freshwater ecotoxicity, and this needs to be considered when assessing contributions to the total toxicity score ). Eleven to 15 substances (depending on emission compartment), out of the 210, contributed significantly to the total sum of CF, considering the model uncertainty (substances with a CF that contribute to more than 1% of the sum of CFs).
In addition to the model uncertainty, there is also parameter uncertainty. Parameter uncertainty exists also in the USEtox CFs and is likely increased in the QSAR-based CFs calculated within the present study. To quantify the additional uncertainty, introduced by the use of estimated data instead of experimental data, the statistical procedure that Rosenbaum et al. (2008) applied in the model comparison, based on McKone (1993) was applied also here. A 95% confidence interval was used to generate a quantification of the orders of magnitude of the added uncertainty. The 95% confidence interval for the regression estimate was division/ multiplication with 300-500 CTU e and, in analogy to Rosenbaum et al. (2008), the factor describing the additional uncertainty approximately 100-1000. Adding this uncertainty to the model uncertainty makes the total uncertainty three to five orders of magnitude and thus impractical for implementation. In this case, with the plastic additives, the CFs range over 18 orders of magnitude or more and it would be possible to differentiate between substances contributing to the risk score and those with no significant contribution. But the large bulk of substances would still have an unknown contribution score if uncertainties were accounted for.

Prioritisation and ranking of chemical emissions
The TEST-based CFs (dataset 2) were used to generate IS, since those were the QSAR-based CFs most similar to the USEtox CFs. The IS integrate the emitted amount from the emissions inventory with the fate, exposure and effect as quantified in the CF. Taking the above-mentioned uncertainty into account, 37-86 substances could be identified as the ones with the highest ecotoxicity potential as those were within three to five orders of magnitude of the total IS. The substances giving the largest contributions to the overall ecotoxicity potential score belong to several different functionality categories, e.g. pigments, flame retardants, stabilisers, plasticisers and lubricants. Flame retardants and plasticisers were estimated to be emitted in the largest amounts contributing to 36 and 28% of total emissions, respectively, with the remaining additive categories contributing with less than 10% each to the total emitted amount. The IS on the other hand imply that also pigments and stabilisers (bio) impose a risk as also they contribute with more than 10% to the total IS.

Discussion and conclusions
The main aim of this study was to assess the possibilities to use QSAR-based data in the LCIA model USEtox to rank chemical emissions according to the predicted relative aquatic ecotoxicity potential. The results show that QSAR data can indeed be used for a fast calculation of CFs with the USEtox model and that those CFs could be used to make prioritisation in large inventories as substance coverage can be markedly increased compared to if only existing USEtox CFs are used. However, it was also shown that substantial uncertainty was added to the CFs, limiting the current practical use. The choice of QSAR model was shown to be crucial for the relevance and robustness of the outcome and in the present study two ecotoxicity estimation models were used, and the TEST model was shown to provide the best estimations when data were to be used to calculate CFs as similar as possible to USEtox CFs, despite that it only cover species from two trophic levels. Alfonsín et al. (2014), Igos et al. (2014) and Roos et al. (2017) could show that the effect data is by far the most influential parameter when CFs are calculated with the USEtox model, and our results are in line with this finding as we could see that the variation in the EF was the most important contributor to the variation seen in the CF in the comparison between the QSAR-based CFs and the USEtox CFs. Interestingly, we could also see that the FF can be another important contributor to the CF variation, since there was a large difference in property data for some substances and parameters. Recently, Roos et al. (2017) proposed the increased use of QSAR estimated data to calculate CFs as a way to fill data gaps where experimental data are missing. Roos et al. (2017) assigned the CF qualitative uncertainty scores indicating high uncertainty for CFs calculated based on estimated data alone. The present study, with a large QSAR-based dataset, quantitatively shows how large the uncertainties can be as the additional uncertainty from the use of QSARs to generate input data was quantified to a factor 100-1000. In the present study, the USEtox CFs, with FF based on estimated and experimental data, and EF mainly based on experimental data, were considered the baseline, not considering uncertainties in that dataset. The uncertainty quantified herein could therefore be under-or overestimated in relation to the true uncertainty. In any case, since the added uncertainty is large, further studies investigating how QSAR models can be used for calculation of CFs are needed. This includes population also of the input data for calculation of the FF and XF, e.g. dissociation constants, but mainly an expanded overview and comparison between available and relevant models to generate ecotoxicity data.
It can be argued that experimental data made available under the REACH legislation should be the first-hand choice for population of the USEtox model input data. Several authors have indeed tested the usability of European data sources, e.g. Müller et al. (2017), Saouter et al. (2017a, b), and have concluded that the USEtox model and procedures need to be adapted to make it possible to use all available data, e.g. by allowing the use of chronic data expressed in other forms than EC 50 . Despite the huge increase in data availability for experimental data on physico-chemicals properties as well as (eco)toxicity that comes with the REACH registrations, a parallel line of work on the inclusion of estimated (QSAR) data in CF calculation is also warranted, as QSAR data will probably be an important data source in the future in any case, as the reduction of animal testing is one of the EU goals (ECHA 2017).
In addition to the model and parameter uncertainty, there is also uncertainty added from the limitations of model scope, e.g. that USEtox only includes freshwater ecotoxicity and thus excludes effects in the marine or benthic compartments. Other models are available and depending on model structure and scope the results can differ, for example, Mattila et al. (2011) compared three models and identified USEtox to give a lot of focus to metals, and thus, metal contribution will be given a high weight using USEtox in cases where the inventory contain metals.
To conclude, the use of QSAR models to generate data for calculation of CFs has potential to fill data gaps and allow for a first screening of large inventories based on ecotoxicity potential. The uncertainty added by using estimated data is however a limiting factor, and further research is needed to develop recommendations on what models to use and possibly also to develop better models.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.