Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A minor query about the image channel number check using im.shape[0] < 5 #13029

Closed
1 task done
Le0v1n opened this issue May 20, 2024 · 5 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@Le0v1n
Copy link

Le0v1n commented May 20, 2024

Search before asking

Question

Today, I attempted to observe the operation process of YOLOv5 step by step. While reviewing the check_amp(model) function, I had some minor doubts. The specific code is in the AutoShape class within the forward method of the models/common.py file:

for i, im in enumerate(ims):
	f = f"image{i}"  # filename
	if isinstance(im, (str, Path)):  # filename or uri
		im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith("http") else im), im
		im = np.asarray(exif_transpose(im))  
	elif isinstance(im, Image.Image):  # PIL Image
		im, f = np.asarray(exif_transpose(im)), getattr(im, "filename", f) or f
	files.append(Path(f).with_suffix(".jpg").name)
	if im.shape[0] < 5:  # 💡💡💡  image in CHW
		im = im.transpose((1, 2, 0))  # reverse dataloader .transpose(2, 0, 1)

I reviewed the code using debug mode, with the IDE being VSCode, and the DEBUG command as follows:

        {
            "name": "Debug train.py",
            "type": "debugpy",
            "request": "launch",
            "program": "train.py",
            "console": "integratedTerminal",
            "python": "/root/anaconda3/envs/yolo/bin/python",
            "args": [
                "--data", "data/coco128.yaml",
                "--cfg", "models/yolov5s.yaml",
                "--hyp", "data/hyps/hyp.scratch-low.yaml",
                "--weights", "yolov5s.pt",
                "--batch-size", "2",
                "--epochs", "200",
            ]
        }

In this process, the image used is '../yolov5/data/images/bus.jpg' (which is the default), and I'm not sure about the purpose of the code annotated with 💡. The shape of the image im at this time is (1080, 810, 3), so the result of im.shape[0] < 5 is False. I'm not sure if your team members wanted to check the number of channels when writing the code, but I would still be very confused even if it was changed to im.shape[-1] < 5. In my opinion, the code should be im.shape[0] <= 3.

I'm a bit unsure about your original intention here. If you have time, could you please answer my question? Thank you very much! 🤗

Additional

This is not urgent; I hope it doesn't interrupt your regular work. 🥰

@Le0v1n Le0v1n added the question Further information is requested label May 20, 2024
@glenn-jocher
Copy link
Member

Hello! Thanks for your detailed question and for diving deep into the YOLOv5 code! 🌟

The line if im.shape[0] < 5 that you're referring to is indeed checking the shape of the image tensor. In YOLOv5, images are typically manipulated in the format CHW (Channels, Height, Width) after transformations. This specific check is to determine if the image is in CHW format (common in deep learning frameworks like PyTorch) rather than the conventional HWC format used by OpenCV and PIL. If the first dimension (which would be channels in CHW) is less than 5, it likely indicates that the image is in CHW format and needs to be transposed to HWC for certain operations or visualizations.

The condition im.shape[0] < 5 is used because no image channel should have less than 5 channels in typical scenarios (where RGB is 3 channels and RGBA is 4 channels). This is a quick way to infer the tensor layout.

Your suggestion im.shape[0] <= 3 would not be appropriate here, as it would incorrectly transpose images that are already in HWC format but have a height of 3 or less, which is rare but could theoretically occur.

I hope this clears up the confusion! Let me know if you have any more questions. Happy coding! 😊

@Le0v1n
Copy link
Author

Le0v1n commented May 21, 2024

Thank you very much for your response! @glenn-jocher

Actually, I didn't think of the RGBA image format, and your explanation has given me inspiration. I have another small question. When I use the default training parameters (python train.py --data coco128.yaml --weights yolov5s.pt --img 640), the shape format of im at this point is [H, W, C] instead of [C, H, W]. Here is a screenshot of my DEBUG:

image

At this point, the conditional statement in the code if im.shape[0] < 5 is actually checking if H < 5 rather than C < 5. I'm wondering if the code can be modified from if im.shape[0] < 5 to if im.ndim < 5?

# Pre-process
n, ims = (len(ims), list(ims)) if isinstance(ims, (list, tuple)) else (1, [ims])  # number, list of images
shape0, shape1, files = [], [], []  # image and inference shapes, filenames
for i, im in enumerate(ims):
	f = f"image{i}"  # filename
	if isinstance(im, (str, Path)):  # filename or uri
		im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith("http") else im), im
		im = np.asarray(exif_transpose(im))  
	elif isinstance(im, Image.Image):  # PIL Image
		im, f = np.asarray(exif_transpose(im)), getattr(im, "filename", f) or f
	files.append(Path(f).with_suffix(".jpg").name)
	# if im.shape[0] < 5:  # image in CHW
	if im.ndim < 5:  # 💡 This is the modification/change.
		im = im.transpose((1, 2, 0))  # reverse dataloader .transpose(2, 0, 1)
	im = im[..., :3] if im.ndim == 3 else cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)  # enforce 3ch input
	s = im.shape[:2]  # HWC
	shape0.append(s)  # image shape
	g = max(size) / max(s)  # gain
	shape1.append([int(y * g) for y in s])
	ims[i] = im if im.data.contiguous else np.ascontiguousarray(im)  # update
shape1 = [make_divisible(x, self.stride) for x in np.array(shape1).max(0)]  # inf shape
x = [letterbox(im, shape1, auto=False)[0] for im in ims]  # pad
x = np.ascontiguousarray(np.array(x).transpose((0, 3, 1, 2)))  # stack and BHWC to BCHW
x = torch.from_numpy(x).to(p.device).type_as(p) / 255  # uint8 to fp16/32

Thank you very much for your patience and response!

@glenn-jocher
Copy link
Member

Hello again!

I appreciate your follow-up question and the code snippet you've provided. The suggestion to use if im.ndim < 5 wouldn't quite address the issue you're encountering. The .ndim property checks the number of dimensions in the array, which for images will typically be 3 (height, width, channels), regardless of the order (HWC or CHW).

The original intent of if im.shape[0] < 5 is to check if the image is in CHW format, assuming that no image height or width (in HWC format) would be less than 5 pixels, which is a reasonable assumption for the datasets typically used. This check is specifically designed to catch cases where the image might be in a format expected by PyTorch (CHW) rather than HWC.

If you're consistently finding that im is in HWC format at this point in the code, it might be worth investigating earlier in the pipeline to ensure that images are being correctly transformed to CHW format where expected, especially before they are passed to model-related functions that expect this format.

For now, the existing check should suffice in most scenarios, but if you're encountering specific issues with image formats, you might need to add additional checks or transformations based on your particular use case or dataset.

Thank you for your keen observations, and feel free to reach out if you have more questions! 😊

@Le0v1n
Copy link
Author

Le0v1n commented May 21, 2024

@glenn-jocher Thank you very much for your reply. If we directly use im.ndim < 5, it would be too arbitrary and would overlook the difference between HWC and CHW. I appreciate your reminder.

To be honest, the method you have written is really great and can be applied to the majority of datasets. I suggest adding a comment after this code segment, as without any explanation, others might also find it confusing.

Overall, thank you very much for your reply! 😊

@Le0v1n Le0v1n closed this as completed May 21, 2024
@glenn-jocher
Copy link
Member

@Le0v1n hello!

Thank you for your understanding and for the suggestion to add a comment for clarity. It's a great idea to help others who might be reviewing the code in the future. I'll pass this feedback along to the team to consider adding a descriptive comment in the next update.

We appreciate your engagement and thoughtful suggestions! If you have any more ideas or questions, feel free to share. Happy coding! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants