基于Ito随机微分方程的新一代语音合成技术展示

联系方式:wu.shoule@protonmail.com, shiziqiang7@gmail.com, 13621160486



本展示DEMO主要介绍ItoTTS和ItoWave,一种基于Ito随机微分方程的新一代语音合成的最新技术。

ItoTTS和ItoWave旨在解决从文本生成语音的问题。 我们提出通过线性Ito随机微分方程,在条件输入下, 如原始文本或者原始声音特征(例如语音mel谱),利用维纳过程作为驱动,在噪声信号中逐渐减去多余的信号, 从而生成现实中对应的有意义的语音。这个过程很像是奥古斯特·罗丹从原始自然的石头中,利用雕刻技术和方法, 从自然的石头中逐渐去掉多余的部分,从而雕刻出思想者。我们的方法将语音合成中的两个重要方面, 也就是text-to-speech(TTS)和声码器(vocoder)统一在一个框架下完成了,我们分别称为ItoTTS和ItoWave。 这个统一框架由两个随机过程组成,而这两个随机过程分别是线性Ito随机微分方程和其对应的反向时间Ito随机微分方程所确定的解。 这两个随机过程,特别是反向的随机过程可以在文本输入条件下生成mel特征(ItoTTS);或者在mel特征条件下, 生成对应的连续声音(ItoWave)。实验结果显示,我们的主观听众MOS评分达到世界最高水平。

我们的ItoTTS和ItoWave的关键模块有两个,一个是用于预测log语音概率密度梯度值的深度神经网络,一个是基于该梯度值和 反向Ito随机微分方程的采样算法。



预测log语音概率密度梯度值的深度神经网络

ItoTTS里的预测网络结构

Italian Trulli

ItoWave里的预测网络结构

Italian Trulli





您可以试听下ItoTTS和ItoWave合成的一些声音样本,相应文本如下:

1. but they proceeded in all seriousness, and would have shrunk from no outrage or atrocity in furtherance of their foolhardy enterprise.

2. three cars for press photographers, an official party bus for white house staff members and others, and two press buses.

3. a base station at a fixed location in dallas operated a radio network which linked together the lead car,

4. the lifting had been so complete in this case that there was no trace of the print on the rifle itself when it was examined by latona.

5. with the active cooperation of the responsible agencies and with the understanding of the people of the united states in their demands upon their president,



ItoTTS和其他TTS系统的合成效果比较

原始真人语音 FastSpeech 2合成的语音 Tacotron 2合成的语音 ItoTTS合成的语音


ItoWave和其他vocoder系统的合成效果比较

原始真人语音 WaveNet合成的语音 WaveGlow合成的语音 DiffWave合成的语音 WaveGrad合成的语音 ItoWave合成的语音



ItoTTS合成较长的语音

例如7.12北京大暴雨(from china daily):Beijing took multiple measures on Monday to cope with the heaviest rain to hit the capital this year. The downpours, along with strong winds, started on Sunday night and are forecast to last until Tuesday morning. From 6 pm on Sunday to 7 pm on Monday, an average of 100.4 millimeters of rain fell across the capital, according to the city's meteorological bureau. However, by late Monday afternoon there was no deep surface water on major roads in urban areas, after city authorities activated pumping stations. Flood warnings were also issued for residents of high-risk areas. Kindergartens and primary and secondary schools in the city suspended classes on Monday and company employees were encouraged to work from home or alter their travel times.



ItoTTS和ItoWave将白噪声变成有意义语音的过程

以“to be or not to be, this is a big problem”为输入文本,ItoTTS从高斯噪声信号中逐步生成对应mel谱图的过程

Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli



以LJSpeech中句子LJ032-0167的频谱为输入,ItoWave从高斯噪声信号中逐步生成相应的语音的过程。 相应的文本是“he concluded, quote, there is no doubt in my mind that these fibers could have come from thisshirt.”

Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli Italian Trulli

发表该技术的相关论文

[1]. Shoule Wu, Ziqiang Shi. ItoTTS and ItoWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation. https://arxiv.org/abs/2105.07583