Paper:A COMPACT FRAMEWORK FOR VOICE CONVERSION USING WAVENET CONDITIONED ON PHONETIC POSTERIORGRAMS
Hui Lu, Zhiyong Wu, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng
System |
Description |
Ablation 1 |
WaveNet conditioned on raw PPGs and Log F0 |
Ablation 2 |
WaveNet conditioned on PPGs and Log F0 processed by BLSTM |
Baseline |
WaveNet conditioned on MCEPs predicted by a BLSTM conversion function from PPGs
|
Proposed |
WaveNet conditioned on PPGs and Log F0 processed by condition network |
Converted Speech Samples
Male-to-Female
index |
Source Speech |
Target Speech |
Ablation 1 |
Ablation 2 |
Baseline |
Proposed |
1.m2f1 |
|
|
|
|
|
|
I have no idea, replied Philip.
2.m2f2 |
|
|
|
|
|
|
Anyway, no one saw her like that.
Female-to-Female
index |
Source Speech |
Target Speech |
Ablation 1 |
Ablation 2 |
Baseline |
Proposed |
3.f2f1 |
|
|
|
|
|
|
That is the strange part of it.
4.f2f2 |
|
|
|
|
|
|
Her mouth opened, but instead of speaking she drew a long sigh.
Male-to-Male
index |
Source Speech |
Target Speech |
Ablation 1 |
Ablation 2 |
Baseline |
Proposed |
5.m2m1 |
|
|
|
|
|
|
I have no idea, replied Philip.
6.m2m2 |
|
|
|
|
|
|
Anyway, no one saw her like that.
Female-to-Male
index |
Source Speech |
Target Speech |
Ablation 1 |
Ablation 2 |
Baseline |
Proposed |
7.f2m1 |
|
|
|
|
|
|
That is the strange part of it.
8.f2m2 |
|
|
|
|
|
|
Her mouth opened, but instead of speaking she drew a long sigh.