Chang ZENG 曾畅　曾暢 (ソウチョウ)

Independent Researcher

Biography

Independent researcher in generative audio and voice LLMs with 7+ years of experience from research to production. I have led end-to-end development of expressive TTS systems and contributed to full-duplex speech systems across data curation, codec and tokenization design, large-scale multi-GPU training, and deployment. My work focuses on turning advanced speech and audio methods into product-ready AI systems for voice generation, multimodal interaction, and intelligent audio understanding.

Download my resumé .

Interests

Generative Audio and Voice LLMs
Multimodal Foundation Models
Speech and Singing Voice Generation
Speaker Recognition and Antispoofing
Audio Separation and Enhancement

Education

PhD in Informatics, 2024
National Institute of Informatics & SOKENDAI
MEng in Electrical Engineering and Information System, 2020
The University of Tokyo
BEng in Measurement and Control Technology and Instruments, 2016
Tianjin University

News

2026.06: Speech Codec Probing from Semantic and Phonetic Perspectives accepted by Interspeech 2026.
2026.05: Released StepAudio 2.5 Technical Report on arXiv.
2026.05: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation accepted by ICML 2026.
2026.03: Released BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection on arXiv.
2026.01: DrivingScene and PAGS accepted by ICASSP 2026.
2025.12: Released Towards Interactive Intelligence for Digital Humans on arXiv.
2025.10: Critical Information Only accepted by IEEE TDSC.
2025.01: SonicSim accepted by ICLR 2025.
2025.01: A Benchmark for Multi-Speaker Anonymization accepted by IEEE TIFS.
2024.12: InstructSing accepted by SLT 2024.
2024.12: Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches accepted by SLT 2024.
2024.03: Released HAM-TTS on arXiv.

Publications

Quickly discover relevant content by filtering publications.

Xuan Shi, Chang Zeng, Tiantian Feng, Shih-Heng Wang, Jianbo Ma, Shrikanth Narayanan (2026). Speech Codec Probing from Semantic and Phonetic Perspectives. Accepted by Interspeech 2026.

PDF ArXiv

Bin Lin, Bo Zhao, Boyong Wu, Chao Yan, Chen Wu, Chang Zeng, et al. (2026). StepAudio 2.5 Technical Report. In arXiv.

PDF Cite ArXiv

Kai Li, Jintao Cheng, Chang Zeng, Zijun Yan, Helin Wang, Zixiong Su, Bo Zheng, Xiaolin Hu (2026). A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation. Accepted by ICML 2026.

PDF Code Dataset ArXiv HF Dataset (3rd Party)

Zhengpei Hu, Kai Li, Dapeng Fu, Chang Zeng, Yue Li, Yuanhao Tang, Jianqiang Huang (2026). BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection. In arXiv.

PDF Code Project Demo ArXiv

Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui (2026). DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes. Accepted by ICASSP 2026.

PDF ArXiv

Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui (2026). PAGS: Priority-Adaptive Gaussian Splatting for Dynamic Driving Scenes. Accepted by ICASSP 2026.

PDF ArXiv

Yiyi Cai, Xuangeng Chu, Xiwei Gao, Sitong Gong, Yifei Huang, Caixin Kang, Kunhang Li, Haiyang Liu, Ruicong Liu, Yun Liu, Dianwen Ng, Zixiong Su, Erwin Wu, Yuhan Wu, Dingkun Yan, Tianyu Yan, Chang Zeng, Bo Zheng, You Zhou (2025). Towards Interactive Intelligence for Digital Humans. In arXiv.

PDF Project ArXiv Demo

Xinfeng Li, Yifan Zheng, Chen Yan, Kai Li, Chang Zeng, Xiaoyu Ji, Wenyuan Xu (2025). Critical Information Only: A Content Privacy-Preserving Framework for Detecting Audio Deepfakes. In IEEE TDSC.

PDF

Kai Li, Wendi Sang, Chang Zeng, Runxuan Yang, Guo Chen, Xiaolin Hu (2025). SonicSim: A Customizable Simulation Platform for Speech Processing in Moving Sound Source Scenarios. Accepted by ICLR 2025.

PDF Code ArXiv

Xiaoxiao Miao, Ruijie Tao, Chang Zeng, Xin Wang (2025). A Benchmark for Multi-Speaker Anonymization. In IEEE TIFS.

PDF

Chang Zeng, Chunhui Wang, Xiaoxiao Miao, Jian Zhao, Zhonglin Jiang, Yong Chen (2024). InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself. In SLT 2024.

PDF Cite Project DOI SLT2024

Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi (2024). Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches. In SLT 2024.

PDF

Chunhui Wang, Chang Zeng, Bowen Zhang, Ziyang Ma, Yefan Zhu, Zifeng Cai, Jian Zhao, Zhonglin Jiang, Yong Chen (2024). HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling. In ArXiv.

PDF Cite Project DOI ArXiv

Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi (2024). Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances. In Computer Speech & Language.

PDF Cite Dataset Project DOI CSL

Xintong Wang, Chang Zeng, Jun Chen, Chunhui Wang (2023). CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers. In ASRU 2023.

PDF Cite Code Dataset Project ASRU2023

Chunhui Wang, Chang Zeng, Jun Chen, Yuhao Wang, Xing He (2023). HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation. In ISNN 2024.

PDF Cite Project DOI ArXiv

Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi (2023). Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms. In Interspeech 2023.

PDF Cite Code Project DOI Interspeech2023

Chunhui Wang, Chang Zeng, Xing He (2023). Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network. In Interspeech 2023.

PDF Cite Code Dataset Project DOI Interspeech2023

Haoyu Tang, Zhaoyi Liu, Chang Zeng, Xinfeng Li (2023). Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition. In ISNN 2024.

PDF Cite DOI ArXiv

Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang (2023). Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification. In ICASSP 2023.

PDF Cite Project DOI ICASSP

Weixin Zhu, Zilin Wang, Jiuxin Lin, Chang Zeng, Tao Yu (2023). SSI-Net: A Multi-Stage Speech Signal Improvement System for ICASSP 2023 SSI Challenge. In ICASSP 2023.

PDF Cite Project DOI ICASSP Link

Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang (2022). Deep Spectro-temporal Artifacts for Detecting Synthesized Speech. In DDAM 2022 Workshop.

PDF Cite Project DOI ACMMM Link

Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki (2022). Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection. In Interspeech 2022.

PDF Cite Project DOI INTERSPEECH Link

Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi (2022). Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022. In Interspeech 2022.

PDF Cite Project Video DOI INTERSPEECH Link

Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi (2022). Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances. In ICASSP 2022.

PDF Cite Code Dataset Project Video DOI ICASSP

Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang (2021). DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics. In ASRU 2021.

PDF Cite Project DOI ASRU Link

Skills

Languages and Tools

Python, C++, Shell, Git, MySQL

Deep Learning

PyTorch, PyTorch Lightning, Hugging Face

Speech Toolkits

SpeechBrain, WeNet, WeSpeaker, Kaldi, ESPnet

Generative Audio

Expressive TTS, codec and tokenizer design, voice LLMs

Multimodal AI

Audio-language modeling, full-duplex speech systems

Communication

Chinese, English, Japanese

Activities

Reviewer Service

Conferences: NeurIPS, ICLR, ICML, ACL, ICASSP, ICME, INTERSPEECH
Journals: IEEE OJSP, IEEE TASLP

Academic Activities

ICASSP22, Interspeech22 Oral Presentation
Organizing Committee, Joint Workshop of VoicePersonae and ASVspoof 2023
SLT 2024, SVDD Challenge (Invited Talk): Shared insights on singing voice generation as a guest speaker

Open Source

WeSpeaker contributor
ASV-Subtools contributor
HIVE dataset contributor

Experience

Independent Researcher

Independent

May 2026 – Present Remote

Conduct independent research on generative audio, voice LLMs, multimodal interaction, and intelligent audio understanding.

Explore foundation models and data-centric methods for speech, audio, and multimodal AI
Continue open research collaborations and publication work across generative audio and sound understanding

Senior AI Researcher

Shanda AI Research Tokyo

Sep 2025 – May 2026 Tokyo (Hybrid)

Led R&D of expressive TTS and full-stack voice-agent technologies for avatar and game products.

Built KodamaTTS from scratch on a Qwen-based foundation model for virtual human and gaming applications
Covered data curation, codec design, multi-node multi-GPU training, and evaluation in one pipeline
Achieved sub-1 kbps codec quality with strong objective performance and reached top benchmark rankings in 3 languages
Served as Co-PI on a joint project with Tsinghua University on cocktail-party speech interaction and audio separation

Multimodal Generative AI Researcher

Li Auto

Apr 2024 – Sep 2025 Hangzhou

Developed voice generation systems for Li Auto smart-space products and contributed to the multimodal foundation model MindGPT-4o.

Proposed the GFSQ tokenizer for GPT-SoVITS to improve codebook utilization and decoding quality
Trained a multi-timbre, multi-style voice generation model for in-car voice-blog scenarios in production
Built synthetic-data workflows to scale accents, dialects, languages, emotions, and scenarios
Led data production and audio-head pretraining and post-training for a full-duplex conversational model

Avatar Research Intern

Bombax XiaoIce Technology Co., Ltd

Jul 2022 – Jul 2023 Remote

Focused on high-fidelity 48kHz singing voice generation in collaboration with research and engineering teams.

Upgraded XiaoiceSing to XiaoiceSing2 with adversarial training and achieved near-human MOS
Developed HiFi-WaveGAN with a pulse-sequence design for stronger 48kHz singing synthesis quality
Built CrossSinger for cross-lingual multi-singer SVS in English, Japanese, and Chinese
Improved training efficiency with InstructSing and explored hierarchical acoustic modeling for voice LMs

Speech Recognition Researcher

Alibaba

Apr 2020 – Sep 2020 Hangzhou

Developed speech AI systems for Taobao Live compliance and broadcaster-risk control.

Built a large-scale speaker recognition system for broadcaster identity verification in livestream scenarios
Developed a spoken-term detection pipeline for policy-sensitive and illegal word monitoring
Researched self-supervised speech representations and implemented an ESPnet-based end-to-end ASR system

Posts

Browse all posts

Adaptive Granularity Importance Sampling for Policy Optimization

A research note on adaptive granularity importance sampling for policy optimization, focusing on speech token dependencies and segment-level weighting strategies such as MASPO-Fixed, MASPO-LogProb, and MASPO-TokenVal.

Chang ZENG 曾畅　曾暢 (ソウチョウ)

Mar 11, 2026 9 min read Research Notes

Adaptive Granularity Importance Sampling for Policy Optimization

Chang ZENG 曾畅 曾 暢 (ソウ チョウ)

Independent Researcher

Biography

News

Publications

Skills

Activities

Reviewer Service

Academic Activities

Open Source

Experience

Posts

Chang ZENG 曾畅　曾暢 (ソウチョウ)