-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Add timeout to partition html #2416
base: main
Are you sure you want to change the base?
Fix: Add timeout to partition html #2416
Conversation
Hi @pratiksinghchauhan thanks for the contribution. Could you please add a line to |
@badGarnet! I will make the changes and update the PR. Thanks! |
@pratiksinghchauhan - Is this something you'd still like to work on? |
@MthwRobinson Apologies for the delay! I am still interested in getting this done. I have already completed the feature set just need to get the documentation done. I will get it done this weekend for sure. |
Sounds good - thanks! |
@MthwRobinson I have updated the PR. Do let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting one small tweak
@@ -39,6 +39,7 @@ def partition_html( | |||
include_metadata: bool = True, | |||
headers: dict[str, str] = {}, | |||
ssl_verify: bool = True, | |||
request_timeout: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small tweak, but since we already have **kwargs
in partition_html
, could we pass this through with that instead of another explicit parameter?
@@ -131,7 +132,7 @@ def partition_html( | |||
) | |||
|
|||
elif url is not None: | |||
response = requests.get(url, headers=headers, verify=ssl_verify) | |||
response = requests.get(url, headers=headers, verify=ssl_verify, timeout=request_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then this can be kwargs.get("request_timeout")
here
This PR adds a timeout parameter to requests in partition_html to stop the code from blocking indefinitely if the URL fails to respond withing the given timeout,