Chang ZENG 曾畅 曾 暢 (ソウ チョウ)

Chang ZENG 曾畅 曾 暢 (ソウ チョウ)

Ph.D. Candidate

National Institute of Informatics & SOKENDAI

Biography

I am a Ph.D. candidate with 6 years of speech signal processing/sequence-to-sequence (S2S)/deep learning experiences. I have explored speaker recognition in universities and speech recognition in the industry. My research/work interest includes speech/speaker recognition and generative speech AI such as TTS and voice conversion.

Download my resumé .

Interests
  • Artificial Intelligence
  • Speech Signal Processing
  • Singing Voice / Speech Synthesis
  • Speech Recognition
  • Language Processing
Education
  • PhD in Informatics, 2024

    National Institute of Informatics & SOKENDAI

  • MEng in Electrical Engineer and Information Systems (EEIS), 2020

    The University of Tokyo

  • BSc in Measurement and Control Technology and Instruments, 2016

    Tianjin University

News

Publications

Quickly discover relevant content by filtering publications.
(2026). A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation. In arXiv.

PDF ArXiv

(2026). DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes. Accepted by ICASSP 2026.

PDF ArXiv

(2026). PAGS: Priority-Adaptive Gaussian Splatting for Dynamic Driving Scenes. Accepted by ICASSP 2026.

PDF ArXiv

(2025). Towards Interactive Intelligence for Digital Humans. In arXiv.

PDF Project ArXiv Demo

(2025). Critical Information Only: A Content Privacy-Preserving Framework for Detecting Audio Deepfakes. In IEEE TDSC.

(2025). SonicSim: A Customizable Simulation Platform for Speech Processing in Moving Sound Source Scenarios. Accepted by ICLR 2025.

PDF Code ArXiv

(2025). A Benchmark for Multi-Speaker Anonymization. In IEEE TIFS.

(2024). InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself. In SLT 2024.

PDF Cite Project DOI SLT2024

(2024). Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches. In SLT 2024.

(2024). HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling. In ArXiv.

PDF Cite Project DOI ArXiv

(2024). Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances. In Computer Speech & Language.

PDF Cite Dataset Project DOI CSL

(2023). Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification. In ICASSP 2023.

PDF Cite Project DOI ICASSP

(2023). SSI-Net: A Multi-Stage Speech Signal Improvement System for ICASSP 2023 SSI Challenge. In ICASSP 2023.

Cite Project DOI ICASSP Link

(2022). Deep Spectro-temporal Artifacts for Detecting Synthesized Speech. In DDAM 2022 Workshop.

PDF Cite Project DOI ACMMM Link

(2022). Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection. In Interspeech 2022.

PDF Cite Project DOI INTERSPEECH Link

(2022). Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances. In ICASSP 2022.

PDF Cite Code Dataset Project Video DOI ICASSP

(2021). DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics. In ASRU 2021.

Cite Project DOI ASRU Link

Skills

Python

100%

C++

80%

Audio Signal Processing

100%

Audio Generative AI

90%

Speech Recognition

90%

PyTorch

100%

Activities

Reviewer Service

  • Conference:
    • NeurIPS, ICLR, ICML, ACL
    • INTERSPEECH, ICASSP, ASRU, SLT
  • Journal:
    • IEEE/ACM TASLP, IEEE OJSP

Experience

 
 
 
 
 
Shanda AI Research Tokyo
Speech AI Researcher
Shanda AI Research Tokyo
Sep 2025 – Present Tokyo

Responsibilities include:

  • Work mode: Hybrid
  • Speech synthesis
  • Speech understanding
  • Full-duplex spoken dialogue
 
 
 
 
 
Li Auto
Speech ML Researcher
Apr 2024 – Present Hangzhou

Responsibilities include:

  • Speech signal processing
  • Speech recognition
  • Speech synthesis
  • Generative AI
 
 
 
 
 
RevComm Inc
Speech ML Researcher (Intern)
Sep 2023 – Mar 2024 Remote

Responsibilities include:

  • Speech signal processing
  • Speech recognition
  • Speech synthesis
  • Generative AI
 
 
 
 
 
Bombax XiaoIce Technology Co., Ltd
Avatar Researcher (Joint Project)
Jul 2022 – Jul 2023 Remote

Responsibilities include:

  • Speech signal processing
  • Singing voice synthesis
  • Speech synthesis
 
 
 
 
 
National Insitute of Informatics
Research Assistant
Jul 2021 – Aug 2023 Tokyo

Responsibilities include:

  • Speech signal processing
  • Speaker recognition
  • Antispoofing
 
 
 
 
 
Alibaba
Speech Recognition Researcher
Apr 2020 – Nov 2020 Hangzhou

Responsibilities include:

  • Speech signal processing
  • Speaker recognition
  • Speech recognition
  • Self-supervised learning
  • Spoken term detection
 
 
 
 
 
Alibaba
Speech Recognition Researcher (Intern)
Jul 2019 – Oct 2019 Beijing

Responsibilities include:

  • Speech signal processing
  • Speaker recognition