Modeling Atomic Conformational Ensembles of Proteins via Test-Time Supervision of Boltz-2 on Cryo-EM Density Maps

arXiv, 2026

Abstract

Knowledge of a protein’s atomic conformational ensemble is critical to determining its function, yet state-of-the-art ensemble prediction models are limited by lack of high-quality conformational data from simulation or experiment. Recent advances in heterogeneous reconstruction for cryo-electron microscopy (cryo-EM) have enabled scientists to visualize ensembles of density maps for larger proteins and complexes not typically accessible through simulation, but building atomic models into these maps remains a challenge. Traditionally, ensemble prediction models are trained via a two-stage process: experimental density maps are converted into atomic structural ensembles through model building, after which these structures are used to train sequence-to-atomic ensemble predictors. In this work, we propose a new principle for fine-tuning pre-trained static structure prediction models such as Boltz-2 directly on raw cryo-EM maps, bypassing the two-stage process. We apply this technique to the problem of atomic model building by fine-tuning Boltz-2 to generate atomic conformations from an input ensemble of cryo-EM maps, achiev- ing superior model building accuracy compared to prior work. Beyond overfitting to individual map ensembles, our method, CryoSampler, also shows preliminary evidence of in-domain generalization after fine-tuning, sampling diverse atomic conformations for an unseen sequences within the same protein family without requiring cryo-EM data. These capabilities indicate that CryoSampler holds the potential to train next-generation atomic ensemble prediction models directly on raw cryo-EM measurements.

Results

Comparison to State-of-the-Art Model Building Algorithms

Model building performance comparison visualized the TRPV3 channel protein processed via 3D Variability Analysis. The experimental cryo-EM map is shaded in transparent gray. Our method is run on four input maps jointly, and we visualize the results on the first of these maps. Among all other methods, ours achieves the best map-model fit against the observed map, as shown visually in the zoom inset and quantitatively as assessed by the cross-correlation metric CC_volume in Phenix. ModelAngelo achieves the second highest map-model fit for this map, but its modeled structure is incomplete and contains missing regions, as shown in the rightmost extent of the inset map.

Model Building on Various Experimental Datasets

Model building performance comparison on four systems: TRPV3, P-glycoprotein, integrin αVβ8, and the neurokinin-1 GPCR. The methods are assessed based on map–model fit and model geometry in Phenix, which is standard practice in the field. Our method achieves the highest map–model fit for nearly every cross-correlation metric across every system studied when compared to prior work, and maintains competitive stereochemistry as assessed by MolProbity scores. Please see our paper for more quantitative details and discussion of these results.

Ensemble Prediction Comparison

In this experiment, we train on an ensemble of four maps of the TRPV3 channel protein, and evaluate each method’s ability to predict the ensemble for a distinct TRPV5 protein, assessed against a held-out validation set of four cryo-EM maps derived from 3D Variability Analysis. We visualize one of the ground truth TRPV5 maps in transparent gray, as well as the closest generated atomic structure for each method to this map. As shown in the zoom insets, our method produces an atomic conformation that fits the cryo-EM map most faithfully. Quantitatively, we achieve the lowest Wasserstein (W2) distance, which measures ensemble accuracy pairwise across all generated atomic conformations and true maps.

Other Projects From Our Lab

CryoDRGN-AI
Ab initio heterogeneous reconstruction for cryo-EM and cryo-ET.

X-RAI: Scalable Online Reconstruction for X-ray Single Particle Imaging
An autoencoder architecture for amortized pose estimation and reconstruction for X-ray single particle imaging.

BibTeX

@article{shenoy2026cryosampler,
  author    = {Shenoy, Jay and Astore, Miro and Levy, Axel and Poitevin, Frédéric and Hanson, Sonya M. and Wetzstein, Gordon},
  title     = {Modeling atomic conformational ensembles of proteins via test-time supervision of Boltz-2 on cryo-EM density maps},
  journal   = {arXiv},
  year      = {2026},
}