Hui Lu (卢辉)
Hi! I am Hui Lu (卢辉), a researcher and engineer working on speech and language technologies. My current research focuses on speech-based language modeling and full-duplex spoken dialogue. My goal is to build seamless voice interaction interfaces for AI agents, making human-agent communication more natural and efficient. I also work on disentangled speech representation learning, text-to-speech synthesis, and voice conversion.
I recently defended my Ph.D. thesis at the Chinese University of Hong Kong (CUHK), where I am advised by Prof. Helen Meng. Before joining CUHK, I received my M.E. from Tsinghua University, advised by Prof. Zhiyong Wu, and my B.E. from Tongji University.
Education
Ph.D. in Information Systems, The Chinese University of Hong Kong
M.E. in Computer Science, Tsinghua University
B.E. in Communication Engineering, Tongji University
Work Experience
SenseTime Research, Research Intern
End-to-end full-duplex spoken dialogue modeling.
Speechify Inc., Senior Applied Scientist
Controllable TTS and multilingual voice conversion.
Meta AI (FAIR), Research Scientist Intern
LLM-based speech-to-speech translation.
Tencent AI Lab, Research Intern
Non-autoregressive TTS with VAEs.
Microsoft, Software Engineer Intern
Challenge Award
Selected Publications
Speech-based Language Modeling & Full-Duplex Spoken Dialogue
- How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue. arXiv preprint, 2026. [paper] [demo]
- Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model. arXiv preprint, 2026. [paper]
Text-to-Speech Synthesis
- SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis. arXiv preprint, 2026. [paper]
- VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis. Interspeech, 2021. [paper] [demo] [code]
Speech Disentanglement & Voice Conversion
- Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations. ICASSP, 2024. [paper] [demo]
- SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody. ACM MM, 2023. [paper] [demo] [code]
- Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE. SLT, 2022. [paper] [demo] [code]
- One-shot Voice Conversion with Global Speaker Embeddings. Interspeech, 2019. [paper] [demo]
- A Compact Framework for Voice Conversion Using WaveNet Conditioned on Phonetic Posteriorgrams. ICASSP, 2019. [paper] [demo]
Academic Services
- Reviewing: NeurIPS, ACM Multimedia, ICASSP, Interspeech, COLING, ICME, LREC
- Teaching: ENGG2760 — Probability for Engineers; ENGG1120 — Linear Algebra for Engineers