2024 journal article
An in vitro and machine learning framework for quantifying serum albumin binding of per- and polyfluoroalkyl substances
Toxicological Sciences.
Abstract Per- and polyfluoroalkyl substances (PFAS) are a diverse class of anthropogenic chemicals; many are persistent, bioaccumulative, and mobile in the environment. Worldwide, PFAS bioaccumulation causes serious adverse health impacts, yet the physiochemical determinants of bioaccumulation and toxicity for most PFAS are not well understood, largely due to experimental data deficiencies. As most PFAS are proteinophilic, protein binding is a critical parameter for predicting PFAS bioaccumulation and toxicity. Among these proteins, human serum albumin (HSA) is the predominant blood transport protein for many PFAS. We previously demonstrated the utility of an in vitro differential scanning fluorimetry assay for determining relative HSA binding affinities for 24 PFAS. Here, we report HSA affinities for 65 structurally diverse PFAS from 20 chemical classes. We leverage these experimental data, and chemical/molecular descriptors of PFAS, to build 7 machine learning classifier algorithms and 9 regression algorithms, and evaluate their performance to identify the best predictive binding models. Evaluation of model accuracy revealed that the top performing classifier model, logistic regression, had an AUROC statistic of 0.936. The top performing regression model, support vector regression, had an R2 of 0.854. These top performing models were then used to predict HSA-PFAS binding for chemicals in the EPAPFASINV list of 430 PFAS. These developed in vitro and in silico methodologies represent a high-throughput framework for predicting protein-PFAS binding based on empirical data, and generate directly comparable binding data of potential use in predictive modeling of PFAS bioaccumulation and other toxicokinetic endpoints.