We present ROSE-CD :Robust One-step Speech Enhancement via Consistency Distillation, a novel approach for distilling a one-step consistency model. Specifically, we introduce a randomized learning trajectory to improve the model’s robustness to noise. Furthermore, we jointly optimize the one-step model with two time-domain auxiliary losses, enabling it to recover from teacher-induced errors and surpass the teacher model in overall performance. This is the first pure one-step consistency distillation model for diffusion-based speech enhancement, achieving 54 times faster inference speed and superior performance compared to its 30-step teacher model. Experiments on the VoiceBank-DEMAND dataset demonstrate that the proposed model achieves state-of-the-art performance in terms of speech quality. Moreover, its generalization ability is validated on both an out-of-domain dataset and real-world noisy recordings.
Noisy Input
Clean Reference
SGMSE
ROSE-CD (PESQ only)
ROSE-CD (SI-SDR only)
ROSE-CD (Final)
Noisy Input
Clean Reference
SGMSE
ROSE-CD (PESQ only)
ROSE-CD (SI-SDR only)
ROSE-CD (Final)
Noisy Input
Clean Reference
SGMSE
ROSE-CD (PESQ only)
ROSE-CD (SI-SDR only)
ROSE-CD (Final)
Noisy Input
SGMSE
ROSE-CD (PESQ only)
ROSE-CD (SI-SDR only)
ROSE-CD (Final)
Noisy Input
SGMSE
ROSE-CD (PESQ only)
ROSE-CD (SI-SDR only)
ROSE-CD (Final)
Noisy Input
SGMSE
ROSE-CD (PESQ only)
ROSE-CD (SI-SDR only)
ROSE-CD (Final)
@inproceedings{xu2025rosecd,
title = {Robust One-step Speech Enhancement via Consistency Distillation},
author = {Liang Xu and Longfei Felix Yan and W. Bastiaan Kleijn},
booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
year = {2025},
organization = {IEEE}
}