On the Success and Limitations of Auxiliary Network Based Word-Level End-to-End Neural Speaker Diarization | Synapse