ROSE-CD

We present ROSE-CD :Robust One-step Speech Enhancement via Consistency Distillation, a novel approach for distilling a one-step consistency model. Specifically, we introduce a randomized learning trajectory to improve the model’s robustness to noise. Furthermore, we jointly optimize the one-step model with two time-domain auxiliary losses, enabling it to recover from teacher-induced errors and surpass the teacher model in overall performance. This is the first pure one-step consistency distillation model for diffusion-based speech enhancement, achieving 54 times faster inference speed and superior performance compared to its 30-step teacher model. Experiments on the VoiceBank-DEMAND dataset demonstrate that the proposed model achieves state-of-the-art performance in terms of speech quality. Moreover, its generalization ability is validated on both an out-of-domain dataset and real-world noisy recordings.

Overview of the proposed robust consistency distillation (RCD)

Summary of Results Using ROSE-CD

Summary of results using ROSE-CD on DNS Challenge 2020 dataset.

VB-DMD Samples

Sample 1

Noisy Input

Clean Reference

SGMSE+(30 steps)

ROSE-CD (PESQ only)

ROSE-CD (SI-SDR only)

ROSE-CD (Final)

Sample 2

Noisy Input

Clean Reference

SGMSE+(30 steps)

ROSE-CD (PESQ only)

ROSE-CD (SI-SDR only)

ROSE-CD (Final)

Sample 3

Noisy Input

Clean Reference

SGMSE+(30 steps)

ROSE-CD (PESQ only)

ROSE-CD (SI-SDR only)

ROSE-CD (Final)

DNS300 Samples

Sample 1

Noisy Input

SGMSE+(30 steps)

ROSE-CD (PESQ only)

ROSE-CD (SI-SDR only)

ROSE-CD (Final)

Sample 2

Noisy Input

SGMSE+(30 steps)

ROSE-CD (PESQ only)

ROSE-CD (SI-SDR only)

ROSE-CD (Final)

Sample 3

Noisy Input

SGMSE+(30 steps)

ROSE-CD (PESQ only)

ROSE-CD (SI-SDR only)

ROSE-CD (Final)

BibTeX

@inproceedings{xu2025rosecd,
      author    = {Xu, Liang and Yan, Longfei Felix and Kleijn, W. Bastiaan},
      title     = {Robust One-step Speech Enhancement via Consistency Distillation},
      booktitle = {Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
      year      = {2025},
      month     = oct,
      address   = {Lake Tahoe, CA, USA},
      publisher = {IEEE}
    }

Robust One-step Speech Enhancement via Consistency Distillation (ROSE-CD)

Overview of the proposed robust consistency distillation (RCD)

Summary of Results Using ROSE-CD

VB-DMD Samples

Sample 1

Sample 2

Sample 3

DNS300 Samples

Sample 1

Sample 2

Sample 3

BibTeX