How to split "Accent" information of the speaker? #30

rppravin · 2021-01-20T19:33:37Z

Thanks for the codebase. Good work!

In the paper, Speech is split into -- timbre (using speaker embedding), pitch, rhythm, content. If I am not wrong, the accent information of the speaker is not captured by the speaker embedding. (I know this because when I experimented with AutoVC codebase, the speaker embedding did not capture the accent info. It accent info of the source speech was always seen in the voice conversion output.)

Any ideas on how to split the accent information from speech?

Thanks,
Pravin

auspicious3000 · 2021-01-20T20:19:19Z

Good question. AutoVC can disentangle accent to some extent but not sufficient. Disentangling accent is another interesting research problem. We do not have a solution for that.

rppravin · 2021-01-20T20:55:32Z

Thanks @auspicious3000

After Speech decomposition into timbre, pitch, rhythm, content -- Would you expect the accent information to be a part of the content embedding?

auspicious3000 · 2021-01-20T23:38:46Z

Accent may be a part of each component.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to split "Accent" information of the speaker? #30

How to split "Accent" information of the speaker? #30

rppravin commented Jan 20, 2021

auspicious3000 commented Jan 20, 2021

rppravin commented Jan 20, 2021

auspicious3000 commented Jan 20, 2021

How to split "Accent" information of the speaker? #30

How to split "Accent" information of the speaker? #30

Comments

rppravin commented Jan 20, 2021

auspicious3000 commented Jan 20, 2021

rppravin commented Jan 20, 2021

auspicious3000 commented Jan 20, 2021