Chang ZENG 曾畅 曾 暢 (ソウ チョウ)

Chang ZENG 曾畅 曾 暢 (ソウ チョウ)

Speech AI Researcher

Shanda AI Research Tokyo

Biography

I received my Ph.D. in Informatics from National Institute of Informatics (NII) and SOKENDAI. After graduation, I joined Li Auto as a multimodal generative AI researcher, and I am now a Speech AI Researcher at Shanda AI Research Tokyo. My research focuses on speech and audio foundation models, speaker recognition and antispoofing, and generative audio AI including speech and singing voice synthesis.

Download my resumé .

Interests
  • Artificial Intelligence
  • Speech Signal Processing
  • Singing Voice / Speech Synthesis
  • Speech Recognition
  • Language Processing
Education
  • PhD in Informatics, 2024

    National Institute of Informatics & SOKENDAI

  • MEng in Electrical Engineer and Information Systems (EEIS), 2020

    The University of Tokyo

  • BSc in Measurement and Control Technology and Instruments, 2016

    Tianjin University

News

Publications

Quickly discover relevant content by filtering publications.
(2026). DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes. Accepted by ICASSP 2026.

PDF ArXiv

(2026). PAGS: Priority-Adaptive Gaussian Splatting for Dynamic Driving Scenes. Accepted by ICASSP 2026.

PDF ArXiv

(2025). Towards Interactive Intelligence for Digital Humans. In arXiv.

PDF Project ArXiv Demo

(2025). Critical Information Only: A Content Privacy-Preserving Framework for Detecting Audio Deepfakes. In IEEE TDSC.

PDF

(2025). SonicSim: A Customizable Simulation Platform for Speech Processing in Moving Sound Source Scenarios. Accepted by ICLR 2025.

PDF Code ArXiv

(2025). A Benchmark for Multi-Speaker Anonymization. In IEEE TIFS.

PDF

(2024). InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself. In SLT 2024.

PDF Cite Project DOI SLT2024

(2024). Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches. In SLT 2024.

PDF

(2024). HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling. In ArXiv.

PDF Cite Project DOI ArXiv

(2024). Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances. In Computer Speech & Language.

PDF Cite Dataset Project DOI CSL

(2023). Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification. In ICASSP 2023.

PDF Cite Project DOI ICASSP

(2023). SSI-Net: A Multi-Stage Speech Signal Improvement System for ICASSP 2023 SSI Challenge. In ICASSP 2023.

PDF Cite Project DOI ICASSP Link

(2022). Deep Spectro-temporal Artifacts for Detecting Synthesized Speech. In DDAM 2022 Workshop.

PDF Cite Project DOI ACMMM Link

(2022). Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection. In Interspeech 2022.

PDF Cite Project DOI INTERSPEECH Link

(2022). Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances. In ICASSP 2022.

PDF Cite Code Dataset Project Video DOI ICASSP

(2021). DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics. In ASRU 2021.

PDF Cite Project DOI ASRU Link

Skills

Python

100%

C++

80%

Audio Signal Processing

100%

Audio Generative AI

90%

Speech Recognition

90%

PyTorch

100%

Activities

Reviewer Service

  • Conference:
    • NeurIPS, ICLR, ICML, ACL
    • INTERSPEECH, ICASSP, ASRU, SLT
  • Journal:
    • IEEE/ACM TASLP, IEEE OJSP

Experience

 
 
 
 
 
Shanda AI Research Tokyo
Speech AI Researcher
Shanda AI Research Tokyo
Sep 2025 – Present Tokyo (Hybrid)

Responsibilities include:

  • Speech synthesis
  • Speech understanding
  • Full-duplex spoken dialogue
 
 
 
 
 
Li Auto
Multimodal Generative AI Researcher
Apr 2024 – Aug 2025 Hangzhou

Responsibilities include:

  • Foundation model algorithm research for text and speech generation
  • Singing voice generation and controllable singing synthesis
  • Sound/audio generation and understanding
  • Cross-modal alignment and instruction tuning for text, speech, and audio
 
 
 
 
 
RevComm Inc
Speech ML Researcher (Intern)
Sep 2023 – Mar 2024 Remote

Responsibilities include:

  • Speech signal processing
  • Speech recognition
  • Speech synthesis
  • Generative AI
 
 
 
 
 
Bombax XiaoIce Technology Co., Ltd
Avatar Researcher (Joint Project)
Jul 2022 – Jul 2023 Remote

Responsibilities include:

  • Speech signal processing
  • Singing voice synthesis
  • Speech synthesis
 
 
 
 
 
National Insitute of Informatics
Research Assistant
Jul 2021 – Aug 2023 Tokyo

Responsibilities include:

  • Speech signal processing
  • Speaker recognition
  • Antispoofing
 
 
 
 
 
Alibaba
Speech Recognition Researcher
Apr 2020 – Nov 2020 Hangzhou

Responsibilities include:

  • Speech signal processing
  • Speaker recognition
  • Speech recognition
  • Self-supervised learning
  • Spoken term detection
 
 
 
 
 
Alibaba
Speech Recognition Researcher (Intern)
Jul 2019 – Oct 2019 Beijing

Responsibilities include:

  • Speech signal processing
  • Speaker recognition