Artificial Intelligence #emotional speech synthesis#latent representations
New Research Advances Emotional Speech Synthesis with Latent Representations and FastSpeech 2
Researchers have published an empirical study on arXiv detailing a method for emotional speech synthesis by integrating speaker embedding and a prosody bottleneck into the FastSpeech 2 architecture. The approach addresses two sub-tasks: generating emotional speech for a single speaker and transferring speaking styles from another speaker while retaining target speaker identity. The work was submitted to the VLSP 2022 competition.
Jun 16, 2026 1 source