What question did this study set out to answer?

The aim is to improve speech intelligibility for cochlear implant users in multi-talker environments using minimal enrollment data.

March 25, 2026Open Access

CIGAN: rehabilitation-oriented few-shot speech separation for cocktail party problem in Cochlear implant users

Key Points

The aim is to improve speech intelligibility for cochlear implant users in multi-talker environments using minimal enrollment data.
Developed CIGAN, a conditional identity-guided adversarial network for speech separation.
Utilized a 1-second wake-word enrollment to create speaker-specific embeddings.
Achieved real-time processing with a lightweight architecture (∼1.03 M parameters).
Tested performance under varying Signal-to-Noise Ratios (SNR) from 0–10 dB in 2–20 speaker conditions.
CIGAN outperformed five state-of-the-art methods in various metrics (SI-SDRi improved by +2.7 dB).
Achieved a 15.0 percentage point reduction in Word Error Rate (WER).
Subjective tests showed a Mean Opinion Score (MOS) of 4.7 ± 0.1 for speech clarity.
Demonstrated strong generalization capabilities in speaker separation.

Abstract

Cochlear Implant (CI) users suffer severe speech intelligibility degradation in multi-talker environments (cocktail party problem), with recognition accuracy dropping 30%–50% relative to quiet conditions. Existing deep learning-based separation methods require extensive enrollment data (>10 min) and fail to adapt to unseen speakers in real time, limiting clinical viability. We propose CIGAN, a conditional identity-guided adversarial network that addresses these challenges through domain-space learning. Leveraging only a 1‑s wake-word enrollment, CIGAN generates a discriminative speaker domain via keyword-triggered embeddings, enabling few-shot adaptation. The lightweight architecture (∼1.03 M parameters) achieves real-time processing (∼53 ms latency) while maintaining robust separation under 0–10 dB Signal-to-Noise Ratio (SNR) across 2–20 speaker conditions. Theoretically, CIGAN’s adversarial domain conditioning orthogonalizes target and interfering speakers in latent space, providing stronger generalization than conventional feature-matching. Experimentally, CIGAN outperformed five state-of-the-art (SOTA) baselines (DiffSep + SV, NCSN++, DPCCN, diffTSE, FlowTSE) across Scale-Invariant Signal-to-Distortion Ratio Improvement (SI-SDRi) (+2.7 dB in 20-speaker), Word Error Rate (WER, –15.0 pp), Perceptual Evaluation of Speech Quality (PESQ), and Extended Short-Time Objective Intelligibility (ESTOI; all Holm-adjusted p < 0.001). Subjective listening tests (n = 13) yielded Mean Opinion Score (MOS) = 4.7 ± 0.1 with enhanced consonant clarity. While validation is currently limited to vocoder simulations, CIGAN presents a clinically viable front-end strategy for CI signal enhancement.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Gong et al. (Mon,) studied this question.

synapsesocial.com/papers/69c37adcb34aaaeb1a67cbc9 https://doi.org/https://doi.org/10.1016/j.bspc.2026.110065

Bookmark

View Full Paper