Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _read_bytes function in images.py support for URL #548

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stoensin
Copy link

Add support for fetching images from URLs in _read_bytes_image

Previously, the function only supported reading image files from local paths. This commit extends its functionality to also fetch images from URLs if the path starts with 'http://' or 'https://'. The function now uses the 'requests' library to download the image content when a URL is provided.

If the URL fetch is successful, the image content is returned as bytes. If the status code is not 200, a ValueError is raised with a message indicating the failure.

The existing file reading functionality remains unchanged for non-URL paths, ensuring backward compatibility.

Changes:

  • Modified _read_bytes_image to check if the path is a URL
  • Added conditional logic to handle URL paths using requests.get
  • Included error handling for failed URL fetch attempts

For now, we don't push you to follow any predefined schema of issue, but ensure you've already read our contribution guide: https://open-metric-learning.readthedocs.io/en/latest/from_readme/contributing.html.

Add support for fetching images from URLs in _read_bytes_image

Previously, the function only supported reading image files from local paths. This commit extends its functionality to also fetch images from URLs if the path starts with 'http://' or 'https://'. The function now uses the 'requests' library to download the image content when a URL is provided.

If the URL fetch is successful, the image content is returned as bytes. If the status code is not 200, a ValueError is raised with a message indicating the failure.

The existing file reading functionality remains unchanged for non-URL paths, ensuring backward compatibility.

Changes:
- Modified _read_bytes_image to check if the path is a URL
- Added conditional logic to handle URL paths using requests.get
- Included error handling for failed URL fetch attempts
@AlekseySh AlekseySh added this to In progress in OML-planning via automation Apr 27, 2024
@AlekseySh
Copy link
Contributor

Hey, @stoensin ! Thank you for the interest in OML.

I think we need some tests for this. As an input you can use this url: https://i.ibb.co/wsmD5r4/photo-2022-06-06-17-40-52.jpg. You can put it here: https://github.com/OML-Team/open-metric-learning/tree/main/tests/test_oml/test_datasets

@AlekseySh AlekseySh moved this from In progress to Review in progress in OML-planning Apr 27, 2024
return fin.read()
if isinstance(path, str) and path.startswith(('http://', 'https://')):
# If the path is a URL, use requests to fetch the content
response = requests.get(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance point of view, it would be more efficient to create a session instance self.session = self.session or requests.Session() and make requests using self.session.get(...). Otherwise, for each call of __getitem__, you will need to establish a new connection with the remote server (DNS resolution, certificate exchange, etc). Establishing a connection can take a huge amount of time.

I'm not sure about an exact implementation but doing requests.get is not a good idea for dataset scenario, because we can reuse a session.

@AlekseySh
Copy link
Contributor

Hey, @stoensin ! Let's move on with this PR. Please, see the comment above: #548 (comment)

@AlekseySh AlekseySh removed this from Review in progress in OML-planning Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants