Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Add timeout to partition html #2416

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pratiksinghchauhan
Copy link

This PR adds a timeout parameter to requests in partition_html to stop the code from blocking indefinitely if the URL fails to respond withing the given timeout,

@pratiksinghchauhan pratiksinghchauhan changed the title Add timeout to partition html Fix: Add timeout to partition html Jan 17, 2024
@badGarnet
Copy link
Collaborator

Hi @pratiksinghchauhan thanks for the contribution. Could you please add a line to CHANGELOG.md under "Fixes" to describe what this change does? And per standard please increase the dev number for the version number?

@pratiksinghchauhan
Copy link
Author

@badGarnet! I will make the changes and update the PR. Thanks!

@MthwRobinson
Copy link
Contributor

@pratiksinghchauhan - Is this something you'd still like to work on?

@pratiksinghchauhan
Copy link
Author

@MthwRobinson Apologies for the delay! I am still interested in getting this done. I have already completed the feature set just need to get the documentation done. I will get it done this weekend for sure.

@MthwRobinson
Copy link
Contributor

Sounds good - thanks!

@pratiksinghchauhan
Copy link
Author

@MthwRobinson I have updated the PR. Do let me know.

Copy link
Contributor

@MthwRobinson MthwRobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting one small tweak

@@ -39,6 +39,7 @@ def partition_html(
include_metadata: bool = True,
headers: dict[str, str] = {},
ssl_verify: bool = True,
request_timeout: Optional[int] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small tweak, but since we already have **kwargs in partition_html, could we pass this through with that instead of another explicit parameter?

@@ -131,7 +132,7 @@ def partition_html(
)

elif url is not None:
response = requests.get(url, headers=headers, verify=ssl_verify)
response = requests.get(url, headers=headers, verify=ssl_verify, timeout=request_timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then this can be kwargs.get("request_timeout") here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants