This study introduces a novel encoding scheme for DNA/RNA sequences, integrating Komlós and Hadamard transforms. Unlike traditional One-Hot encoding, this approach offers a more informative representation of omics data while significantly reducing computational complexity. However, it is important to note that the Komlós transform component provides fewer features and does not utilize sparse codes. By leveraging the inherent properties of these transforms, our method effectively captures complex patterns within the data, leading to improved model accuracy and reduced training times. When combined with an image transformation, this encoding scheme demonstrates particularly efficient results, achieving superior performance across various predictive tasks with significantly lower computational resource demands compared to One-Hot encoding. Our findings suggest that this novel encoding scheme, particularly when integrated with Hilbert Curve mapping or sequence to image analysis, holds significant promise for advancing DNA/RNA data analysis by offering a more efficient and effective approach to feature representation.
Kabbani et al. (Mon,) studied this question.