More information about the pretrains' dataset. #1987

codename0og · 2024-04-20T15:54:19Z

@RVC-Boss We already know that for the dataset, the vctk corpus ( 108 speaker version ) was used but how about the processing?
Was there anything applied?

denoising
peak / rms normalization
compression
how about the dynamic range?

I am asking because, as much as I've done tons of models I still can't quite find anything useful in that regard based on my trainings;

A) Is it better to limit the dynamic range of the dataset to the possible maximum ( without distortion introduction ofc )
B) Maintaining it somewhat natural ( slight peaks taming + slight compression to even stuff out and then -2 or -3 db general norm )
B) Taking care of the harsher peaks / peaks in general but leaving the dynamic range alone

What kind of approach you think would be suitable for your pretrains?
I'd really benefit from such information, and I am pretty sure some other more advanced users too.
Thank you in advance!

SCRFilms · 2024-04-24T23:48:47Z

@RVC-Boss We already know that for the dataset, the vctk corpus ( 108 speaker version ) was used but how about the processing? Was there anything applied?

I just checked some samples in the vctk dataset and it's really bad.

-tons of mouth clicks
-loud mic noise
-low frequency rumbling noise (could be DC offset issue)
-lacks breathes sounds
-lacks pitch variations (speaker's pitches just sits about 110hz to 200hz)
-lacks higher harmonic details (causes it to have flipping harmonic and static harmonic artifacting)

I don't think they even apply processing to the audios, the dataset is also bad in the first place.

fumiama added documentation 📄文档说明 question 💬信息不足 help wanted 🚸请求协助 labels Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More information about the pretrains' dataset. #1987

More information about the pretrains' dataset. #1987

codename0og commented Apr 20, 2024

SCRFilms commented Apr 24, 2024

More information about the pretrains' dataset. #1987

More information about the pretrains' dataset. #1987

Comments

codename0og commented Apr 20, 2024

SCRFilms commented Apr 24, 2024