Structure, function, dynamics and regulation are closely related in protein–DNA complexes. The communication within such systems, i.e. atomic-level, time-resolved interactions, is central for this relation. In this thesis classical molecular dynamics (MD) simulations and advanced data-driven analytical methods were employed to study communication in protein–DNA complexes. Network-based algorithms provided insight into e.g. communication pathways, while the information-theoretic measure of transfer entropy yielded additional directional information. The systems studied within this thesis are proteins that bind to specific DNA sequences and are involved in various important biological processes. The thymine DNA glycosylase (TDG) is a DNA repair enzyme that recognizes specific damages in DNA; the CXXC-domain of mixed-lineage leukemia 1 (MLL1) and Wilms tumor protein (WT1) are transcription factors that contain structral tetrahedrical-coordinated zinc ions; methyl-CpG-binding domain protein 2 (MBD2) is involved in epigenetic regulation by binding to methylated CpG DNA. For TDG we found that the discrimination between cognate and non-cognate damaged bases is unlikely to take place at the initial complex formation. Instead, we found indications that TDG can discriminate the bases when the damaged base is flipped out of the DNA helix into the active site of the enzyme. For the two zinc-containing proteins, MLL1 and WT1, we studied the effect of modelleling the zinc ions with different force fields, spanning the range from bonded (rigid) over hybrid to non-bonded (flexible) models. The difficulty lies in the versatile coordination and geometry those ions and their coordinating amino acids can adopt depending on the specific environment. The thesis showed that the effect of different models is mostly local but can potentially cause subtle, long-range structural changes over extended time periods. The choice of model highly depends on the properties needed and should always be evaluated carefully. Especially bonded models require high-quality experimental structures. For MBD2 we uncovered a previously unknown and stable secondary binding conformation on methylated DNA, establishing a bistable equilibrium that expands our understanding of MBD2’s interaction dynamics. The allosteric macro-switch between those two equilibrium states was found to be protein residue Serine 189, which was validated experimentally by the group of David C. Williams Jr. through S189A mutation. Communication analyses significantly enhance the understanding of protein–DNA complexes by providing an impartial, data-centric view of their dynamic behaviour at an atomic level. It allows for the revelation of intricate, long-range allosteric effects and the pinpointing of important sites that mediate these interactions, which are often difficult to discern through traditional analysis methods. Transfer entropy measures temporal directionality of communication, thus complementing instantaneous, non-directional correlations. Depending on multiple factors, such as available data and metric of choice, a robust estimator has to be implemented. Prior to its application and subsequent interpretation of results, underlying parameters need to be carefully tested and validated. Crucially, this thesis underscores the power of integrating classical MD simulations with advanced data-driven analysis, particularly using network analysis and information theory, to gain unbiased and deep understanding of the dynamic communication mechanisms within protein–DNA complexes, thus offering valuable insights for deciphering molecular mechanisms.
Senta Volkenandt (Thu,) studied this question.