Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to split "Accent" information of the speaker? #30

Open
rppravin opened this issue Jan 20, 2021 · 3 comments
Open

How to split "Accent" information of the speaker? #30

rppravin opened this issue Jan 20, 2021 · 3 comments

Comments

@rppravin
Copy link

Thanks for the codebase. Good work!

In the paper, Speech is split into -- timbre (using speaker embedding), pitch, rhythm, content. If I am not wrong, the accent information of the speaker is not captured by the speaker embedding. (I know this because when I experimented with AutoVC codebase, the speaker embedding did not capture the accent info. It accent info of the source speech was always seen in the voice conversion output.)

Any ideas on how to split the accent information from speech?

Thanks,
Pravin

@auspicious3000
Copy link
Owner

Good question. AutoVC can disentangle accent to some extent but not sufficient. Disentangling accent is another interesting research problem. We do not have a solution for that.

@rppravin
Copy link
Author

Thanks @auspicious3000

After Speech decomposition into timbre, pitch, rhythm, content -- Would you expect the accent information to be a part of the content embedding?

@auspicious3000
Copy link
Owner

Accent may be a part of each component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants