You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, Speech is split into -- timbre (using speaker embedding), pitch, rhythm, content. If I am not wrong, the accent information of the speaker is not captured by the speaker embedding. (I know this because when I experimented with AutoVC codebase, the speaker embedding did not capture the accent info. It accent info of the source speech was always seen in the voice conversion output.)
Any ideas on how to split the accent information from speech?
Thanks,
Pravin
The text was updated successfully, but these errors were encountered:
Good question. AutoVC can disentangle accent to some extent but not sufficient. Disentangling accent is another interesting research problem. We do not have a solution for that.
Thanks for the codebase. Good work!
In the paper, Speech is split into -- timbre (using speaker embedding), pitch, rhythm, content. If I am not wrong, the accent information of the speaker is not captured by the speaker embedding. (I know this because when I experimented with AutoVC codebase, the speaker embedding did not capture the accent info. It accent info of the source speech was always seen in the voice conversion output.)
Any ideas on how to split the accent information from speech?
Thanks,
Pravin
The text was updated successfully, but these errors were encountered: