Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice sounds like it just got back from the dentist lol #93

Open
cg5666 opened this issue Dec 14, 2023 · 0 comments
Open

Voice sounds like it just got back from the dentist lol #93

cg5666 opened this issue Dec 14, 2023 · 0 comments

Comments

@cg5666
Copy link

cg5666 commented Dec 14, 2023

Hello. Thank you RVC team for the great app! The problem I'm running into is that my voices sometimes sound like they just came back from the dentists, lacking articulation on consonants and vowels. This isn't so much the case when it comes to real-time / microphone recording but when I'm overdubbing a video or podcast (especially if the original recording is done on a camera microphone or similar) it has a hard time picking up some articulations.

My source samples are clean (so no issues there) and I try to deverb and denoise my target audio. 1

  1. Is there anything I can do to improve this?
    2.. Is there any documentation that explains what the settings do? I have a slight idea but it would be great to read some documentation.

Ultimately, I was wondering if I could commission a TTS style app / add on if anyone would be interested. I don't know how much this would cost but if it's only a couple of hundred bucks, I can cover the expense completely. Here is the vision for the app:

It would be a TTS interface except when you upload your tarpet audio it would then process it in the voice you specified. From this point it would give you a text representation of your audio with time code so you can then go change words. This way you can edit any gibberish or fi mistakes like "It was the year 1995" when it should be "It was the year 1999".

Another feature request I would like to see is for the app to completely ignore any accents. For example, if the target audio has accents then it would use the natural cadence and access of the original voice. I know there is a slider for this but any improvement in this area would be helpful!!!

Finally question

Does anyone know what play.ht is using for their source code? Their TTS and voice cloning is AMAZING. I would really like to see something like that expect with the ability to overdub voice-overs to fit to the cadence of the target audio to sync the lips of the video.

Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant