Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies | Synapse