Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning

1 - Stanford University
2 - SLAC National Accelerator Laboratory

Abstract

X-ray free-electron lasers (XFELs) offer unique capabilities for measuring the structure and dynamics of biomolecules, helping us understand the basic building blocks of life. Notably, high-repetition-rate XFELs enable single particle imaging (X-ray SPI) where individual, weakly scattering biomolecules are imaged under near-physiological conditions with the opportunity to access fleeting states that cannot be captured in cryogenic or crystallized conditions. Existing X-ray SPI reconstruction algorithms, which estimate the unknown orientation of a particle in each captured image as well as its shared 3D structure, are inadequate in handling the massive datasets generated by these emerging XFELs. Here, we introduce X-RAI, an online reconstruction framework that estimates the structure of a 3D macromolecule from large X-ray SPI datasets. X-RAI consists of a convolutional encoder, which amortizes pose estimation over large datasets, as well as a physics-based decoder, which employs an implicit neural representation to enable high-quality 3D reconstruction in an end-to-end, self-supervised manner. We demonstrate that X-RAI achieves state-of-the-art performance for small-scale datasets in simulation and challenging experimental settings and demonstrate its unprecedented ability to process large datasets containing millions of diffraction images in an online fashion. These abilities signify a paradigm shift in X-ray SPI towards real-time capture and reconstruction.

Methods

During image acquisition, a femtosecond-resolution X-ray pulse intersects a single, hydrated molecule, creating a diffraction image. This process is repeated at high rates to collect millions of diffraction images, each observing the molecule with an unknown orientation, or pose. Our algorithm, X-RAI, employs a CNN-based encoder to efficiently estimate the pose of the molecule in each image. A physically accurate decoder (in Fourier space, shown here in real space) produces a noise-free estimate of the diffraction pattern using the molecule's 3D structure, represented as a neural field. The symmetric loss is applied to optimize the parameters of the encoder and decoder in an online fashion using self-supervision. At any point during the experiment, the intensity volume can be phased to obtain an estimate of the electron density.

Results

Comparison to State-of-the-Art Offline Reconstruction

Comparison of X-RAI with M-TIP on reconstructing datasets with 50,000 images in an offline setting. Each row corresponds to a separate dataset simulated using Skopi with the protein data bank (PDB) code specified in the leftmost column. For both datasets, X-RAI is able to reconstruct each particle to within 4 nanometers of resolution, exceeding the quality of M-TIP. For the protein 6J5I, M-TIP fails to reconstruct the intensity volume, resulting in the degenerate density volume shown.

Online Reconstruction

Online reconstruction of the 50S ribosomal subunit (PDB: 5O60). Remarkably, X-RAI is able to reconstruct the protein even when acquiring the data sequentially in batches of 64 images. Two synthetic datasets with varying levels of beam fluence (photons per pulse) are shown in order to illustrate the effect of signal level on convergence speed. Our method converges to a resolution of about 3.6 nm after processing 2.56 million images when handling images with 1e13 photons per pulse. On the other hand, the reconstruction only converges after around 3.84 million images have been processed for the dataset with a lower fluence of 6e12 photons per pulse, demonstrating the importance of large datasets in the low signal regime.

Experimental Data

Reconstruction of the PR772 virus from experimental data collected at an XFEL facility. To enforce the known icosahedral symmetry of the virus, we augment the encoder's orientation estimates with the symmetry rotations of the icosahedron, effectively spreading out the pose estimates over SO(3). The resulting intensity and density reconstructions are of resolution 11.0 and 18.2 nanometers, respectively. The diffraction patterns are displayed with reduced contrast.

Related Work

Perspectives on single particle imaging with x rays at the advent of high repetition rate x-ray free electron laser sources

An overview of the capabilities and opportunities of X-ray single particle imaging.

A Glimpse of Structural Biology through X-Ray Crystallography
Single-particle cryo-EM—How did it get here and where will it go

Review articles on X-ray crystallography and cryo-electron microscopy, the leading methods for protein structure determination.

Other Projects From Our Lab

CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images
An autoencoder architecture for amortized pose estimation in homogeneous cryo-EM reconstruction.

Amortized Inference for Heterogeneous Reconstruction in Cryo-EM
An amortized approach to heterogeneous reconstruction in cryo-EM.

BibTeX

@article{shenoy2023xrai,
  author    = {Shenoy, Jay and Levy, Axel and Poitevin, Frédéric and Wetzstein, Gordon},
  title     = {Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning},
  journal   = {arXiv:preprint},
  year      = {2023},
}