Key points are not available for this paper at this time.
Abstract Quantifying molecular similarity is a cornerstone of cheminformatics, underpinning applications from virtual screening and nearest-neighbor search to chemical space visualization and the evaluation of machine-learning predictions. Although Tanimoto-comparisons of 2D fingerprints are widely used, the practical behavior of these similarity measures depends strongly on the fingerprint type, representation (binary vs. count), and whether fingerprints are folded into fixed-length vectors. Here, we systematically benchmark a broad set of common fingerprint types, including dictionary-based, circular (Morgan/FCFP), path-based (RDKit), topological-distance-based (Atom Pair), hybrid distance-encoded (MAP4), torsion, LINGO, and Avalon, across multiple large datasets and complementary evaluation tasks. We quantify fingerprint specificity via duplicate rates and mass discrepancies, characterize score distributions and compound-size dependence, assess top-k ranking agreement, and compare fingerprint similarities to a graph-based reference. Across benchmarks, count (and often log-count) variants generally improve specificity and structural alignment, while folding-induced bit collisions can strongly distort similarities for high-occupancy fingerprints, making unfolded variants particularly important for RDKit and often necessary for MAP4 on heterogeneous datasets. To support reproducible benchmarking and future extensions, we introduce the open-source Python library chemap, providing unified computation of folded, unfolded, and frequency-folded fingerprints and optimized similarity calculations. Scientific contribution We introduce a multi-criteria benchmarking framework for molecular fingerprints that goes beyond classical virtual-screening retrieval tests by evaluating specificity (fingerprint duplicates), score behavior (including compound-size dependence), ranking agreement, and neighborhood structure on large, heterogeneous small-molecule datasets. Our results reveal that common default settings, especially folding for high-occupancy fingerprints, can introduce severe bit-collision artifacts, and that count (often log-count) and unfolded variants substantially improve specificity and agreement with structure-based references. We release the open-source Python library chemap to standardize these fingerprint variants and enable reproducible, extensible benchmarking for future fingerprint development.
Huber et al. (Sun,) studied this question.