This paper presents a novel knowledge-graph-guided approach for generating synthetic troubleshooting dialogues in network operations environments. Traditional methods for creating training data for conversational AI systems require costly expert annotation, often costing 50-200× more than automated approaches. Our technique constructs a Domain Knowledge Graph (DKG) from historical support tickets and technical manuals, encoding symptom-fault-diagnostic-remediation relationships. Using this structured knowledge with transformer-based generation and explicit control signals, we generate synthetic dialogues that maintain logical diagnostic flow while eliminating hallucinations. KEY FINDINGS: • Zero hallucinations when validated against knowledge graph (0% vs 24.7% baseline) • BLEU score of 0.44 ± 0.02 with factual accuracy of 94.3% ± 1.1% • 27.1% reduction in Mean Time To Repair (MTTR) in simulated deployment • 15.2% improvement in first-call remediation success rates • 50-200× cost reduction compared to manual expert annotation • Generation speed of 165ms per dialogue (~6,000 dialogues/hour) TECHNICAL CONTRIBUTIONS: The system integrates named-entity recognition (NER) for extracting network-specific entities, a domain knowledge graph with 2,177 nodes (543 symptoms, 218 faults, 1,089 diagnostics, 327 remediations), transformer-based neural generation with control signals for domain/severity/length, and multi-stage validation ensuring factual correctness. Evaluated across five network domains (RAN, Core, Transport, Access, IP) with 1,000 test dialogues and a 3-month simulated deployment with 50 operators, demonstrating statistical significance (p < 0.001) for all operational improvements. APPLICATIONS: • Training conversational AI for network troubleshooting • Generating synthetic training data for specialized technical domains • Reducing dependency on expensive expert annotation • Improving AI assistant performance in telecom operations This work establishes a paradigm for generating high-fidelity training data in domains where authentic data remains scarce due to confidentiality constraints or annotation costs.
Naveen et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: