Cochlear Implant (CI) users suffer severe speech intelligibility degradation in multi-talker environments (cocktail party problem), with recognition accuracy dropping 30%–50% relative to quiet conditions. Existing deep learning-based separation methods require extensive enrollment data (>10 min) and fail to adapt to unseen speakers in real time, limiting clinical viability. We propose CIGAN, a conditional identity-guided adversarial network that addresses these challenges through domain-space learning. Leveraging only a 1‑s wake-word enrollment, CIGAN generates a discriminative speaker domain via keyword-triggered embeddings, enabling few-shot adaptation. The lightweight architecture (∼1.03 M parameters) achieves real-time processing (∼53 ms latency) while maintaining robust separation under 0–10 dB Signal-to-Noise Ratio (SNR) across 2–20 speaker conditions. Theoretically, CIGAN’s adversarial domain conditioning orthogonalizes target and interfering speakers in latent space, providing stronger generalization than conventional feature-matching. Experimentally, CIGAN outperformed five state-of-the-art (SOTA) baselines (DiffSep + SV, NCSN++, DPCCN, diffTSE, FlowTSE) across Scale-Invariant Signal-to-Distortion Ratio Improvement (SI-SDRi) (+2.7 dB in 20-speaker), Word Error Rate (WER, –15.0 pp), Perceptual Evaluation of Speech Quality (PESQ), and Extended Short-Time Objective Intelligibility (ESTOI; all Holm-adjusted p < 0.001). Subjective listening tests (n = 13) yielded Mean Opinion Score (MOS) = 4.7 ± 0.1 with enhanced consonant clarity. While validation is currently limited to vocoder simulations, CIGAN presents a clinically viable front-end strategy for CI signal enhancement.
Gong et al. (Mon,) studied this question.