Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in Architecture Between Code and Paper #100

Open
taubaaron opened this issue Oct 21, 2021 · 1 comment
Open

Differences in Architecture Between Code and Paper #100

taubaaron opened this issue Oct 21, 2021 · 1 comment

Comments

@taubaaron
Copy link

Hey, firstly - thank you very much for sharing your work, it really is interesting.

I have a few issues regarding the implication of the paper: "AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss":

  1. In section 3.1 "Problem Formulation", it is explained (and showed in figure 1) that the output from the speaker encoder (input was target speaker utterance) is fed directly into the decoder (after the bottleneck).
    In the code implementation on the other hand, it seems that the output from the speaker encoder is actually concatenated with the Mel_spectrogram and fed into the content encoder and not later after the bottleneck.

  2. Again, in figure 1, it is shown that during train stage the "style" is used from the same speaker but in another file/section for comparison. Is that implemented in the code too? it didn't seem like it but I might be missing something.

  3. In Table 1 (page 8), you present results for classification testing, for the output of the content encoder. Is there a way I can try to regenerate the same results? (can you share this part of the code too?)

Thanks very much,
Aaron

@auspicious3000
Copy link
Owner

  1. The speaker emb is also concatenated with the encoder output before feeding into the decoder.
  2. yes, the speaker emb is extracted from the same speaker but most likely different utterances.
  3. just train a classifier on the encoder output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants