Fluorescence fluctuations, when collected sufficiently fast, contain information on molecular properties from mobility to brightness and can provide information on concentrations. Currently, the evaluation of these fluctuations is achieved by calculating various statistical functions, e.g., auto- or cross-correlation functions. However, these statistics are often biased estimators, require extensive data collection for precise and accurate evaluations, need analytic models for data fitting, and have nonlinear dependencies that can complicate the analysis. We, therefore, developed convolutional neural networks (CNNs) and vision transformers (ViTs) to analyze the spatiotemporal traces directly without the intermediate calculation of any evaluating functions with the aim to provide model-free analysis of data with significantly reduced data requirements, and the possibility to perform real-time analysis. In this work we compare the performance of CNNs and ViTs on imaging data collected at a frame rate of 500–1000 frames per second in single plane illumination or total internal reflection fluorescence microscopy (SPIM, TIRFM). We demonstrate that the ViTs can predict a wider range of parameters, including diffusion coefficients (D), particle density (N) and molecular brightness (B) from as little as 2000 data points collected in 2 seconds. We train and validate both networks on simulated data and test it on a range of different data sets from supported lipid bilayers and live cells and demonstrate that ViTs can predict D, N and B consistent with values obtained by correlation analysis of 50,000 data points. The reduced data requirements and the model-free, simulation-led approach make ViTs a suitable add-on to imaging applications, providing more information on a sample by simply acquiring the SPIM and TRIFM data faster.
Wohland et al. (Sun,) studied this question.