Stony Brook University Master's Thesis ('24-'25)Mentor: Dr. Joshua Rest, Department of Ecology & Evolution, Stony Brook UniversitySecond Reader: Dr. Jeroen B. Smaers, Department of Anthropology, Stony Brook UniversitySummary: Analyzing variations in protein structure rates of evolution.Results: Completed thesis paper (full paper below), manuscript in preparation
Thesis Paper
Rates of Protein Structural Evolution in Mammals.pdf
Introduction
Goal:
To measure how rapidly different protein structural features evolve across mammals and to test whether structural evolution mirrors, decouples from, or complements sequence evolution.
Key Questions:
Do different structural features evolve at different rates?
Are sequence evolutionary rates predictive of structural evolutionary rates?
Are specific biological functions associated with unusual structure–sequence rate combinations?
Species phylogeny
Methods
Dataset and Scope
36 mammalian species spanning major placental clades
Single-copy orthogroups only, eliminating confounding effects of gene duplication and loss
~4,500 orthogroups initially considered; ~2,300 retained after conservative filtering
This design prioritizes phylogenetic clarity and minimizes artifacts in rate estimation.
Orthogroup distribution by number of species present
Feature Prediction
Because experimentally resolved 3D structures are unavailable for most proteins, we used new structure prediction tools to extract biologically meaningful, two-dimensional structural descriptors:
These features capture core biophysical regimes—folded cores, flexible regions, membrane association—that strongly influence evolutionary constraint.
Estimating rates
Each structural feature was treated as a continuous quantitative trait evolving along a phylogeny.
For every orthogroup and structural feature, I fit a Brownian motion model using phylogenetic comparative methods.
The estimated variance parameter (σ²) represents the rate of structural evolution: how quickly that feature diverges among lineages after accounting for shared ancestry.
Rates were computed per feature, per orthogroup, enabling direct comparison across structural classes.
This framework reframes protein structure as an evolving phenotype rather than a fixed reference.
Sequence Evolution
To evaluate the relationship between sequence and structure:
Amino-acid alignments were fit to maximum-likelihood phylogenetic models.
Total substitutions per site across the tree were used as a sequence evolutionary rate for each orthogroup.
Structural and sequence rates were normalized to allow meaningful comparison despite differing scales.
Clusters determined via K-Means and shown in elbow and silhouette plots
Clustering and Functional Enrichment
To interpret biological meaning:
Orthogroups were clustered based on their multi-trait structural rate profiles.
Extreme categories (e.g., high-structure/low-sequence rates) were identified using percentile thresholds.
Gene Ontology enrichment analyses tested whether specific functions or cellular localizations were overrepresented among unusual rate combinations.
Results
Structural Evolution
Structural features evolve at dramatically different rates:
Slowest-evolving features:
Buried residues
Transmembrane regions
These are consistent with strong thermodynamic and functional constraints.
Fastest-evolving features:
Intrinsically disordered regions
Solvent-exposed residues
These regions tolerate greater flexibility and functional turnover.This confirms that protein architecture evolves unevenly, even within the same protein.
Structure and Sequence Rates
Overall correlations between sequence substitution rates and mean structural evolutionary rates were weak.
Many proteins show decoupling, evolving rapidly in structure without extreme sequence divergence (or vice versa).
This demonstrates that sequence evolution is not a reliable proxy for structural evolution.
Sequence to structure rate correlations (min-max and z-score normalized)
Structure rates heatmap (red = high, blue = low) with clusters (yellow = middle, green = high, purple = low) and sequence rate for comparison (grayscale heatmap, far left)
Pairwise trait correlations for structure features rates of evolution.
Functional enrichment in 20% most extreme rates across high and low structure and sequence groups
Functional Enrichment in Extreme Rate Categories
Proteins with high structural but low sequence evolutionary rates were enriched for:
Hydrolase activities, particularly glycosyl bond hydrolysis
Lysosomal and vacuolar components
Membrane-associated metabolic processes
These results suggest that some cellular systems maintain conserved sequences while allowing architectural flexibility—likely reflecting functional robustness paired with structural adaptability.
Functional enrichment in 10% most extreme rates across high and low structure and sequence groups
Enrichment in 20% extreme rate subcategory specifically in Biological Processes (BP) on the manatee_elephant branch of the phylogeny (see tree above)