Abstract Motivation Inference of candidate upstream regulators via motif enrichment analysis is a common step in the interpretation of genomic data. However, redundancy in motif databases can negatively impact predictive value, especially when relying on regression-based motif enrichment analysis. Although various forms of motif clustering have been used to mitigate problems caused by redundancy, an algorithm optimized for downstream regression-based analysis is needed. Results We introduce AmalgaMo, an efficient and flexible command-line tool for merging highly similar motifs. Using publicly available human datasets, we demonstrate that merging motifs with our optimized settings greatly benefits regression-based motif enrichment analysis and provide detailed documentation that can serve as a reference for researchers inferring upstream regulators from genomic data. Availability AmalgaMo is available on GitHub at https://github.com/lapohosorsolya/AmalgaMo.
Lapohos et al. (Wed,) studied this question.