Senior Research Scientist in generative audio and voice LLMs with 7+ years of experience from research to production. I have led end-to-end development of expressive TTS systems and contributed to full-duplex speech systems across data curation, codec and tokenization design, large-scale multi-GPU training, and deployment. My work focuses on turning advanced speech and audio methods into product-ready AI systems for voice generation, multimodal interaction, and intelligent audio understanding.
Download my resumé .
PhD in Informatics, 2024
National Institute of Informatics & SOKENDAI
MEng in Electrical Engineering and Information System, 2020
The University of Tokyo
BEng in Measurement and Control Technology and Instruments, 2016
Tianjin University
Python, C++, Shell, Git, MySQL
PyTorch, PyTorch Lightning, Hugging Face
SpeechBrain, WeNet, WeSpeaker, Kaldi, ESPnet
Expressive TTS, codec and tokenizer design, voice LLMs
Audio-language modeling, full-duplex speech systems
Chinese, English, Japanese
Lead R&D of expressive TTS and full-stack voice-agent technologies for avatar and game products.
Developed voice generation systems for Li Auto smart-space products and contributed to the multimodal foundation model MindGPT-4o.
Focused on high-fidelity 48kHz singing voice generation in collaboration with research and engineering teams.
Developed speech AI systems for Taobao Live compliance and broadcaster-risk control.