We introduce SSI-Net, our submission to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge, designed for real-time communication systems. SSI-Net features a multi-stage architecture, beginning with a time-domain restoration generative adversarial network (TRGAN) for initial speech restoration. In the second stage, we use a lightweight multi-scale temporal frequency convolutional network with axial self-attention (MTFAA-Lite) for fullband speech enhancement. In subjective tests on the SSI Challenge blind test set, SSI-Net achieved a P.835 mean opinion score (MOS) of 3.190 and a P.804 MOS of 3.178, ranking 3rd in tracks 1&2.
Weixin Zhu,
Zilin Wang,
Jiuxin Lin,
Chang Zeng,
Tao Yu