question about training loss and inference performance #61

zzw922cn · 2020-10-23T00:56:22Z

Hi, thank you for your very nice work! I have rerun this project, and it has run 90K steps. the loss_id_psnt is around 0.07. And I tried to feed into a in-domain speaker's melspec and his speaker embedding as source embedding, and another speaker's speaker embedding as target speaker embedding. Then I use GL vocoder to generate the wav, I found the voice is still of the source speaker. Is this normal? When can I perform voice conversion successfully? at what step or what's the loss_id_psnt? thank you very much!!

auspicious3000 · 2020-10-23T03:17:07Z

You probably need to fine-tune your bottleneck dimensions.

zzw922cn · 2020-10-23T07:02:32Z

Do you think I should enlarge the bottleneck dimension or decrease the bottleneck dimension?

auspicious3000 · 2020-10-23T07:16:21Z

There's detailed information in the paper on how to tune the bottleneck.

zzw922cn · 2020-10-23T11:06:13Z

OK, thank you~

ruclion · 2020-12-23T08:04:43Z

Do you think I should enlarge the bottleneck dimension or decrease the bottleneck dimension?

the paper said:
The first model, which we name the “too narrow” model, reduces the dimensions of C1→ and C1← from 32 to 16, and increases the downsampling factor from 32 to 128 (note that higher downsampling factor means lower temporal dimension). The second model, which we name the “too wide” model, increases the dimensions of C1→ and C1← to 256, and decreases the sampling factor to 8, and λ is set to 0

But for new dataset, how to choose the hparams? And wheather we should use DANN idea?
Hope to communicate with you~

innovator1311 · 2021-04-27T09:05:26Z

Hi, thank you for your very nice work! I have rerun this project, and it has run 90K steps. the loss_id_psnt is around 0.07. And I tried to feed into a in-domain speaker's melspec and his speaker embedding as source embedding, and another speaker's speaker embedding as target speaker embedding. Then I use GL vocoder to generate the wav, I found the voice is still of the source speaker. Is this normal? When can I perform voice conversion successfully? at what step or what's the loss_id_psnt? thank you very much!!

@zzw922cn Call you tell me which dataset you used and the batch size of training process ? Thanks in advance !!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about training loss and inference performance #61

question about training loss and inference performance #61

zzw922cn commented Oct 23, 2020

auspicious3000 commented Oct 23, 2020

zzw922cn commented Oct 23, 2020

auspicious3000 commented Oct 23, 2020

zzw922cn commented Oct 23, 2020

ruclion commented Dec 23, 2020

innovator1311 commented Apr 27, 2021

question about training loss and inference performance #61

question about training loss and inference performance #61

Comments

zzw922cn commented Oct 23, 2020

auspicious3000 commented Oct 23, 2020

zzw922cn commented Oct 23, 2020

auspicious3000 commented Oct 23, 2020

zzw922cn commented Oct 23, 2020

ruclion commented Dec 23, 2020

innovator1311 commented Apr 27, 2021