SSI-Net: A Multi-Stage Speech Signal Improvement System for ICASSP 2023 SSI Challenge

Image credit: Unsplash

Abstract

The ICASSP 2023 Speech Signal Improvement (SSI) Challenge concentrates on improving the speech signal quality of real-time communication (RTC) systems. In this paper, we introduce the speech signal improvement network (SSI-Net) submitted to the ICASSP 2023 SSI Challenge, which satisfies the real-time condition. The proposed SSI-Net has a multi-stage architecture. We present the time-domain restoration generative adversarial network (TRGAN) in the first restoration stage for speech restoration. Regarding the second enhancement stage, we employ a lightweight multi-scale temporal frequency convolutional network with axial self-attention (MTFAA-Net) called MTFAA-Lite to enhance the fullband speech. In the subjective test on the SSI Challenge blind test set, our proposed SSI-Net yields a P.835 overall mean opinion score (MOS) of 3.190 and a P.804 overall MOS of 3.178, which eventually takes the 3rd place in tracks 1&2.

Publication
In IEEE International Conference on Acoustics, Speech and Signal Processing 2023