Paper:A COMPACT FRAMEWORK FOR VOICE CONVERSION USING WAVENET CONDITIONED ON PHONETIC POSTERIORGRAMS

Hui Lu, Zhiyong Wu, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng

System Description
Ablation 1 WaveNet conditioned on raw PPGs and Log F0
Ablation 2 WaveNet conditioned on PPGs and Log F0 processed by BLSTM
Baseline WaveNet conditioned on MCEPs predicted by a BLSTM conversion function from PPGs
Proposed WaveNet conditioned on PPGs and Log F0 processed by condition network

Converted Speech Samples

Male-to-Female

index Source Speech Target Speech Ablation 1 Ablation 2 Baseline Proposed
1.m2f1

I have no idea, replied Philip.

2.m2f2

Anyway, no one saw her like that.

Female-to-Female

index Source Speech Target Speech Ablation 1 Ablation 2 Baseline Proposed
3.f2f1

That is the strange part of it.

4.f2f2

Her mouth opened, but instead of speaking she drew a long sigh.

Male-to-Male

index Source Speech Target Speech Ablation 1 Ablation 2 Baseline Proposed
5.m2m1

I have no idea, replied Philip.

6.m2m2

Anyway, no one saw her like that.

Female-to-Male

index Source Speech Target Speech Ablation 1 Ablation 2 Baseline Proposed
7.f2m1

That is the strange part of it.

8.f2m2

Her mouth opened, but instead of speaking she drew a long sigh.