-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss=nan when training transformertrainer #68
Comments
I've only encountered this issue when training on a very small dataset, e.g too few models. So how many different models are you using in the dataset? |
I don't think it's about the shape's or the nature of the shape, the dataset isn't very big so consider augmenting them at least x50 or x100. I've seen some surprising results with scaling up the dataset size and model size. The same goes for the training data, using a "small" dataset size of 800 meshes vs 14k, it took the autoencoder just a few more hours to reach the same loss when using the larger dataset vs smaller. So consider feeding it more models, I've just started training the transformer with large dataset since the transformer takes considerable longer time to train e.g. 218k meshes for 1 epoch = 4.5hrs vs 1hr with the autoencoder. At least the autoencoder can deal with many types of shapes, as you can see below from the reconstructed mesh below, it was able to reconstruct the petals from the flower or the diamond shaped blobs. Try and check how well the autoencoder it can reconstruct the meshes before you re-train it. You can find the render function in my mesh_render.py
|
Hi again, to avoid retraining from starch, here is a pre-trained model that will provide you with low loss rate after just a couple of fine-tuning epochs. Use "mesh-autoencoder_encoder_4_decoder_8_0.36" that is in:
|
thanks for the advice! After some modification, the encoder and transformer can produce a satisfactory outcome compared to the data in the training dataset. When I try to generate meshes from codes, I find that regardless of the percentage of codes that I offer, the generated meshes all have roughly the same shape, is this normal? |
I used the codes in the jupyter notebook provided by @MarcusLoppe in the discussion section, and have successfully succeeded trained the autoencoder with a loss of 0.6. However, when I tried to proceed to the next section, the training loss remained high, before a few steps later it showed nan. Is this due to some problems in data augmentation or the data itself is not suitable for this method? As I'm using scanned meshes of concrete aggregates instead of the artificially built meshes.
The text was updated successfully, but these errors were encountered: