2020 journal article

Incorporating Nearest-Neighbor Site Dependence into Protein Evolution Models

JOURNAL OF COMPUTATIONAL BIOLOGY, 27(3), 361–375.

By: G. Larson*, J. Thorne n & S. Schmidler*

author keywords: diffusion process; dynamic programming; evolution; phylogeny; protein structure
MeSH headings : Computational Biology / methods; Evolution, Molecular; Models, Statistical; Proteins / chemistry; Proteins / metabolism; Sequence Alignment; Sequence Analysis, Protein; Structural Homology, Protein
TL;DR: This work develops a simple model of site-dependent sequence evolution, which is used to demonstrate the bias resulting from the application of standard site-independent sequence evolution models and yields a significant reduction of bias in estimated evolutionary distances. (via Semantic Scholar)
UN Sustainable Development Goal Categories
15. Life on Land (Web of Science)
Source: Web Of Science
Added: April 20, 2020

Evolutionary models of proteins are widely used for statistical sequence alignment and inference of homology and phylogeny. However, the vast majority of these models rely on an unrealistic assumption of independent evolution between sites. Here we focus on the related problem of protein structure alignment, a classic tool of computational biology that is widely used to identify structural and functional similarity and to infer homology among proteins. A site-independent statistical model for protein structural evolution has previously been introduced and shown to significantly improve alignments and phylogenetic inferences compared with approaches that utilize only amino acid sequence information. Here we extend this model to account for correlated evolutionary drift among neighboring amino acid positions. The result is a spatiotemporal model of protein structure evolution, described by a multivariate diffusion process convolved with a spatial birth-death process. This extended site-dependent model (SDM) comes with little additional computational cost or analytical complexity compared with the site-independent model (SIM). We demonstrate that this SDM yields a significant reduction of bias in estimated evolutionary distances and helps further improve phylogenetic tree reconstruction. We also develop a simple model of site-dependent sequence evolution, which we use to demonstrate the bias resulting from the application of standard site-independent sequence evolution models.