-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try to implement PSMNet use yours instructions #51
Comments
Without seeing the code I cannot provide much help, can you create a pull request with your code? |
@AlessioTonioni Sure, I will create a pull request soon. |
I have created a pull request, thanks for your help. |
@AlessioTonioni Any updates? |
Do the images that you are using have 4 channels for some reason? Like RGB + alpha |
@AlessioTonioni Yes, I use the scene flow dataset, the images contains 4 channels. (2) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. Errors may have originated from an input operation. Input Source operations connected to node model/gc-read-pyramid_1/conv1/Conv2D: Original stack trace for 'model/gc-read-pyramid_1/conv1/Conv2D': So how to fix this error? |
If you get the same error with MADNet and (I guess) Dispnet then I think you have either a problem with your data or with the version of tensorflow you are using. |
Firstly, a sample of the stereo couple + gt has been attached. sceneflow_samples.tar.gz Secondly, I try to run Stereo_Online_Adaptation.py with the following script: LIST="/host/nfs/hs/scene_flow/val.list" #the one described at step (1) python3 Stereo_Online_Adaptation.py GOT THE FOLLOWING ERROR, which is not the same with the previous one. WARNING:tensorflow:From /root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists(from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. During handling of the above exception, another exception occurred: Traceback (most recent call last): Errors may have originated from an input operation. Input Source operations connected to node save/Assign_75: |
Your images have 4 channels, rgb + alpha. I just pushed a commit to explicitly remove the extra channel after having read the images. Let me know if it fixes your problems. The second error you are getting seems to be related to the operation to restore weights on the first conv. I Think it's still an issue with the number of channels in the input image (4 with your images rather than 3) so it might be fixed as well. But if it's not working let me know. |
The first error about Train.py is fixed. The second error still exists, and the same error. |
I cannot replicate the error on my side, which version of tensorflow are you using? The error seems to be related to how weights are restored. |
I install tensorflow-gpu with pip, and I can't install tf1.12, so I installed tf1.14.0. Another info |
I'm testing with the 1.12, the images you send, the weights available online, and everything seems to work. Any other insight on what is happening? |
so far, about this question, it's very strange that: (2)use 3-channels images to run Stereo_Online_Adaptation.py, everything runs ok. I doubt it's about the checkpoint saved with 3-channels images training. I will convert the 4-channels images to 3-channels Step:22800 Loss:17.37 f/b time:0.603486 Missing time:1 day, 1:53:40.462395 |
As for 1, which images are you using? The one you sent me? Because I'm able to use As for PSMNet the loss seems quite high after 25K steps, is it going down? Like what does the plot in tensorboard look like? |
for 1, I use the same images as you tested. I could not see the tensorboard, because the gpu is in the cloud. But I will try to visualize in tensorboard. |
Are you using the reprojection loss to train the network from scratch? In general from this plots you can see that the loss is not going down at all, so there is definitely some implementation problem in the network or you still have some trouble with your data. If you train dispnet or MADNet are you able to see the loss going down? |
Does it doing depth estimation on real time? |
stereo_net = Nets.get_stereo_net(args.modelName, net_args)
the above code runs good, but when I start to train on scene_flow, got the following error.
I have done some debug methods:
(1) check the scene_flow input, but it's ok
(2) check the network, but I didn't find some errors
Could you help to find the bug, thanks very much. If more files needed, I will upload them.
Traceback (most recent call last):
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: input depth must be evenly divisible by filter depth: 4 vs 3
[[{{node model/CNN_2/conv0/conv0_1/Conv2D}}]]
(1) Invalid argument: input depth must be evenly divisible by filter depth: 4 vs 3
[[{{node model/CNN_2/conv0/conv0_1/Conv2D}}]]
[[validation_error/truediv_1/_23]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Train.py", line 191, in
main(args)
File "Train.py", line 140, in main
fetches = sess.run(tf_fetches,options=run_options)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: input depth must be evenly divisible by filter depth: 4 vs 3
[[node model/CNN_2/conv0/conv0_1/Conv2D (defined at /host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/sharedLayers.py:59) ]]
(1) Invalid argument: input depth must be evenly divisible by filter depth: 4 vs 3
[[node model/CNN_2/conv0/conv0_1/Conv2D (defined at /host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/sharedLayers.py:59) ]]
[[validation_error/truediv_1/_23]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node model/CNN_2/conv0/conv0_1/Conv2D:
model/MirrorPad_2 (defined at /host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Data_utils/preprocessing.py:28)
model/CNN/conv0/conv0_1/weights/read (defined at /host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/sharedLayers.py:57)
Input Source operations connected to node model/CNN_2/conv0/conv0_1/Conv2D:
model/MirrorPad_2 (defined at /host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Data_utils/preprocessing.py:28)
model/CNN/conv0/conv0_1/weights/read (defined at /host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/sharedLayers.py:57)
Original stack trace for 'model/CNN_2/conv0/conv0_1/Conv2D':
File "Train.py", line 191, in
main(args)
File "Train.py", line 71, in main
val_stereo_net = Nets.get_stereo_net(args.modelName, net_args)
File "/host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/init.py", line 15, in get_stereo_net
return STEREO_FACTORYname
File "/host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/PSMNet.py", line 22, in init
super(PSMNet, self).init(**kwargs)
File "/host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/Stereo_net.py", line 44, in init
self._build_network(args)
File "/host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/PSMNet.py", line 54, in _build_network
conv4_left = self.CNN(self._left_input_batch)
File "/host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/PSMNet.py", line 71, in CNN
activation=tf.nn.leaky_relu, batch_norm=True, apply_relu=True,strides=2, name='conv0_1',reuse=reuse))
File "/host/nfs/hs/code/research/Real-time-self-adaptive-deep-stereo/Nets/sharedLayers.py", line 59, in conv2d
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=padding,dilations=dilation_rate)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1953, in conv2d
name=name)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/root/anaconda3/envs/MADNet/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()
The text was updated successfully, but these errors were encountered: