You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently once a crawl is stopped, that's it! Users cannot pick up where they left off which results in a few points of friction:
Crawling Large Websites
When crawling a large website (>20,000 pages) users are limited to the first n pages depending on their plan. If the crawler finds more pages than that, there is pretty much no way to capture them as the workflow settings dictate that the workflow must start from the seed URL. In theory one might be able to add every URL as an exclusion? In practice this would be ridiculous.
Picking up Next Month
Our customers have a set amount of execution minutes for the month, and while running into an execution minute limit might lead one to believe that they should purchase additional time, simply waiting until the next month is just as valid.
Requirements
For Stopped crawls, give users the option to "Resume"
This will resume crawling and inherit the crawl queue of the stopped archived item.
It will create a new archived item in the workflow with the new content in it
In a situation where the first item was stopped and resumed, and the second resumed crawl was also stopped, if the second crawl is also resumed, the third crawl should not capture any pages from the first or second ones.
The text was updated successfully, but these errors were encountered:
I would like to expressly support this feature request.
Our current use case is a literary forum with more than 500,000 articles, i.e. significantly more than 50,000 web pages. At the moment, I can only imagine switching to the highest-value tariff for months in which we need such crawls.
Doing a full crawl every time is a huge waste of resources.
Context
Currently once a crawl is stopped, that's it! Users cannot pick up where they left off which results in a few points of friction:
Crawling Large Websites
When crawling a large website (>20,000 pages) users are limited to the first n pages depending on their plan. If the crawler finds more pages than that, there is pretty much no way to capture them as the workflow settings dictate that the workflow must start from the seed URL. In theory one might be able to add every URL as an exclusion? In practice this would be ridiculous.
Picking up Next Month
Our customers have a set amount of execution minutes for the month, and while running into an execution minute limit might lead one to believe that they should purchase additional time, simply waiting until the next month is just as valid.
Requirements
Stopped
crawls, give users the option to "Resume"The text was updated successfully, but these errors were encountered: