Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build the validation data? #62

Open
AShoydokova opened this issue Dec 13, 2021 · 4 comments
Open

How to build the validation data? #62

AShoydokova opened this issue Dec 13, 2021 · 4 comments

Comments

@AShoydokova
Copy link

AShoydokova commented Dec 13, 2021

Hello,

thank you so much for the code and paper! I'm trying to train the model on speech command data. I've made the train and validation data sets through 2 scripts: make_spect_f0.py and make_metadat.py, but the model fails on the validation step, on this line :
x_identic_val = self.G(x_f0, x_real_pad, emb_org_val)

The error is:
RuntimeError: The expanded size of the tensor (192) must match the existing size (1085) at non-singleton dimension 1. Target sizes: [-1, 192, -1]. Tensor sizes: [1085, 1].

I'm not sure why there is a mismatch as self.G worked. Although there is the "G identity mapping loss" step which preprocess the input before feeding to self.G. Do I need to do the same with the validation data? Also 192 is the max_len_pad = 192, while 1085 is the number of the speakers (dim_spk_emb = 1085). Do I need to change the max_len_pad?

I'll appreciate for any help or direction!

My hparams.py is below

hparams = HParams(
    # model   
    freq = 8,
    dim_neck = 8,
    freq_2 = 8,
    dim_neck_2 = 1,
    freq_3 = 8,
    dim_neck_3 = 32,
    out_channels = 10 * 3,
    layers = 24,
    stacks = 4,
    residual_channels = 512,
    gate_channels = 512,  # split into 2 groups internally for gated activation
    skip_out_channels = 256,
    cin_channels = 80,
    gin_channels = -1,  # i.e., speaker embedding dim
    weight_normalization = True,
    n_speakers = -1,
    dropout = 1 - 0.95,
    kernel_size = 3,
    upsample_conditional_features = True,
    upsample_scales = [4, 4, 4, 4],
    freq_axis_kernel_size = 3,
    legacy = True,
    
    dim_enc = 512,
    dim_enc_2 = 128,
    dim_enc_3 = 256,
    
    dim_freq = 80,
    dim_spk_emb = 1085,
    dim_f0 = 257,
    dim_dec = 512,
    len_raw = 128,
    chs_grp = 16,
    
    # interp
    min_len_seg = 19,
    max_len_seg = 32,
    # min_len_seq = 64,
    min_len_seq = 0,
    # max_len_seq = 128,
    max_len_seq = 10,
    max_len_pad = 192,
    
    # data loader
    root_dir = 'assets/spmel',
    feat_dir = 'assets/raptf0',
    batch_size = 16,
    mode = 'train',
    shuffle = True,
    num_workers = 0,
    samplier = 8,

    # Convenient model builder
    builder = "wavenet",

    hop_size = 256,
    log_scale_min = float(-32.23619130191664),
    
)
@auspicious3000
Copy link
Owner

What is the "G identity mapping loss" step?
I guess one of the tensors needs to be transposed because dim and length mean different things.

@AShoydokova
Copy link
Author

Thank you so much for a quick response! Let me play around with it. The training part is working, but evaluation part is failing.

The G identity mapping loss step is this part code that does something with the train data in Solver.train method:

# G identity mapping loss
            x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
            x_f0_intrp = self.Interp(x_f0, len_org) 
            f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]
            x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

@AShoydokova
Copy link
Author

I've fixed my issue. The thing was I was creating the speaker embeddings as a 1 dimensional array, while the model expected 2 dimensional. So I have 1085 speakers and for each speaker I created one-hot encoding vector of the size [1085]. While the model expected a vector of the size [1, 1085].

Thank you again for your help!

@9527950
Copy link

9527950 commented Mar 3, 2023

I have trained the network with extremely poor results. I would like to ask how your validation set is set up? I used directly the demo.pkl file used in the code and found that the loss goes up. Also the hyperparameters given in the source code don't seem to be the same as yours, for example: dim_spk_emb = 82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants