What question did this study set out to answer?

This research compares the capabilities of CNNs and vision transformers in analyzing fluorescence fluctuation data.

February 21, 2026

BPS2026 – Efficient imaging fluorescence fluctuation spectroscopy analysis through vision transformers

Key Points

This research compares the capabilities of CNNs and vision transformers in analyzing fluorescence fluctuation data.
Developed CNNs and vision transformers for direct analysis of spatiotemporal traces
Analyzed imaging data at 500–1000 frames per second
Trained and validated networks on simulated data and tested on lipid bilayers and live cells
ViTs predicted diffusion coefficients, particle density, and molecular brightness from 2000 data points
Predictions were consistent with correlation analysis of 50,000 data points
ViTs require significantly less data for model-free analysis, enabling faster imaging applications

Abstract

Fluorescence fluctuations, when collected sufficiently fast, contain information on molecular properties from mobility to brightness and can provide information on concentrations. Currently, the evaluation of these fluctuations is achieved by calculating various statistical functions, e.g., auto- or cross-correlation functions. However, these statistics are often biased estimators, require extensive data collection for precise and accurate evaluations, need analytic models for data fitting, and have nonlinear dependencies that can complicate the analysis. We, therefore, developed convolutional neural networks (CNNs) and vision transformers (ViTs) to analyze the spatiotemporal traces directly without the intermediate calculation of any evaluating functions with the aim to provide model-free analysis of data with significantly reduced data requirements, and the possibility to perform real-time analysis. In this work we compare the performance of CNNs and ViTs on imaging data collected at a frame rate of 500–1000 frames per second in single plane illumination or total internal reflection fluorescence microscopy (SPIM, TIRFM). We demonstrate that the ViTs can predict a wider range of parameters, including diffusion coefficients (D), particle density (N) and molecular brightness (B) from as little as 2000 data points collected in 2 seconds. We train and validate both networks on simulated data and test it on a range of different data sets from supported lipid bilayers and live cells and demonstrate that ViTs can predict D, N and B consistent with values obtained by correlation analysis of 50,000 data points. The reduced data requirements and the model-free, simulation-led approach make ViTs a suitable add-on to imaging applications, providing more information on a sample by simply acquiring the SPIM and TRIFM data faster.

Bookmark

BPS2026 – Efficient imaging fluorescence fluctuation spectroscopy analysis through vision transformers

Key Points

Abstract

Cite This Study