FA-GAN: Few-artifacts High-fidelity GAN-based Vocoder

Rubing Shen, Yanzhen Ren, Zongkun Sun
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education
School of Cyber Science and Engineering, Wuhan University

0. Contents

  1. Abstract

  2. Seen speaker (LJSpeech)

  3. Unseen speakers (VCTK)



1. Abstract

Generative adversarial network (GAN) based vocoders have achieved significant attention in speech synthesis with high quality and fast inference speed. However, there still exist many noticeable artifacts between the ground truth and generated samples, resulting in the quality decline of synthesized speeches. In this work, we propose a novel GAN-based vocoder designed for the purpose of few-artifacts and high-fidelity, called FA-GAN. 1) To suppress the aliasing artifacts caused by non-ideal upsampling layers in high-frequency areas, we introduce the twin deconvolution module in the generator. 2) To alleviate the blurring artifacts and enrich the reconstruction of spectral details, we propose a novel fine-grained multi-resolution real and imaginary loss by inheriting the real and imaginary components of complex spectrograms in the discriminators. The experimental results reveal that FA-GAN outperforms the state-of-the-art approaches in promoting audio quality and alleviating spectral artifacts, and exhibits superior performance when applied to unseen speaker scenarios.




2. Seen speaker (LJSpeech)

We train FA-GAN with the dataset of LJSpeech, and randomly devide the datatset into training set, validation set and test set, 80%, 10%, 10% respectively. Here are demos of baselines and our proposed FA-GAN in the scenarios of seen speaker.

Demos:

speaker Ground Truth HiFi-GAN Univnet-c32 Avocodo FA-GAN

LJ001-0028

LJ005-0156

LJ008-0172

LJ016-0156

LJ035-0086

3. Unseen speakers (VCTK)

We test the unseen speakers scenarios on the VCTK Corpus and all audio samples are downsampled to 22050 Hz, the audio demos are as follows.

Demos:

speaker Ground Truth HiFi-GAN Univnet-c32 Avocodo FA-GAN

p258

p264

p265

p284

p340