Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to speed up the Documentation build in CI #5352

Open
tkoyama010 opened this issue Dec 16, 2023 · 11 comments
Open

Need to speed up the Documentation build in CI #5352

tkoyama010 opened this issue Dec 16, 2023 · 11 comments
Labels
maintenance Low-impact maintenance activity

Comments

@tkoyama010
Copy link
Member

Describe what maintenance you would like added.

The documentation build time is more of an issue of trying to do too much for each commit. Ideally we'd be able to do incremental builds. Barring that, I feel that waiting 90 minutes for CI checks really inhibits the development experience and we might consider an abbreviated build for PRs and run the full build on main and potentially on doc/* branches.

Originally posted by @akaszynski in #5344 (comment)

Links to source code.

https://github.com/pyvista/pyvista/blob/main/.github/workflows/docs.yml

Pseudocode or Screenshots

None

@tkoyama010 tkoyama010 added the maintenance Low-impact maintenance activity label Dec 16, 2023
@tkoyama010
Copy link
Member Author

tkoyama010 commented Dec 16, 2023

I had a chat with Sphinx's expert friend, they are running a full build once daily on the main branch and doing incremental builds based on the artifacts of main on the other branches. However, this method cannot handle PRs across days.
They conclude that if you want accuracy, you have to sacrifice time, and if you want speed, you have to sacrifice some accuracy.

@MatthewFlamm
Copy link
Contributor

The v4 upload_artifacts is supposedly much faster, this is currently taking >30 minutes on some PRs. This is not the majority of time but any decrease would help.

Another thought is that the interactive documentation takes an extra amount of time, we could turn off by default for PRs, turn on via label.

There are probably a bunch of opportunities here that are quick wins that have low downsides.

@tkoyama010
Copy link
Member Author

Another thought is that the interactive documentation takes an extra amount of time, we could turn off by default for PRs, turn on via label.

Good idea. To add, we can try interactive builds in the preview document build. So there is no problem to keep the PR document build in CI static at all times.

@ChristosT
Copy link
Contributor

Another thought is that the interactive documentation takes an extra amount of time, we could turn off by default for PRs, turn on via label.

Good idea. To add, we can try interactive builds in the preview document build. So there is no problem to keep the PR document build in CI static at all times.

This is a good idea indeed. If you think this would be valuable I can create a PR where the global behavior (all static vs interactive) is controlled via an environmental variable which is easy to set also in CI.

@akaszynski
Copy link
Member

Another thought is that the interactive documentation takes an extra amount of time, we could turn off by default for PRs, turn on via label.

Good idea. To add, we can try interactive builds in the preview document build. So there is no problem to keep the PR document build in CI static at all times.

This is a good idea indeed. If you think this would be valuable I can create a PR where the global behavior (all static vs interactive) is controlled via an environmental variable which is easy to set also in CI.

That would be perfect.

@tkoyama010
Copy link
Member Author

I had a chat with Sphinx's expert friend, they are running a full build once daily on the main branch and doing incremental builds based on the artifacts of main on the other branches. However, this method cannot handle PRs across days. They conclude that if you want accuracy, you have to sacrifice time, and if you want speed, you have to sacrifice some accuracy.

I have come up with a good method for this. We can get a list of files that have been modified compare with the main branch with the following command. Running the touch command against that file solves the correctness problem.

git diff --name-only main...$CURRENT_BRANCH

@user27182
Copy link
Contributor

Has anyone been able to download the docs lately? It takes about 2hrs to download and keeps failing for me part way through.
image

@user27182
Copy link
Contributor

Might be worth considering using sphinx-remove-toctrees. Apparently having lots of auto-generated cross-references from toctrees can slow build times considerably as it takes time to resolve all the cross-references.

See also the relevant docs for the pydata theme:
https://pydata-sphinx-theme.readthedocs.io/en/v0.8.1/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar.

For full documentation builds and releases I don't think we necessarily want to exclude any toctrees. So, this won't reduce build times for that. But, we could possibly add a new workflow to the CI, e.g. Build Documentation - No toctrees which makes use of this. This would (hopefully) allow for a fast-track build time to catch basic things like rst formatting errors without having to wait an hour. Sort of like an initial docs sanity check. The current docs build workflow would still continue as is.

@user27182
Copy link
Contributor

Another possible solution for this: allow selectively building only part of the docs with CLI options.

E.g. pandas has a mechanism like this, see: https://pandas.pydata.org/docs/development/contributing_documentation.html#building-the-documentation

Link to the pandas make.py source:
https://github.com/pandas-dev/pandas/blob/b8a4691647a8850d681409c5dd35a12726cd94a1/doc/make.py

@tkoyama010
Copy link
Member Author

I'm considering using the way that sphinx-gallery can skip the build by registering a hash file.
https://sphinx-gallery.github.io/stable/configuration.html#rerunning-stale-examples

@tkoyama010
Copy link
Member Author

Another possible solution for this: allow selectively building only part of the docs with CLI options.

E.g. pandas has a mechanism like this, see: https://pandas.pydata.org/docs/development/contributing_documentation.html#building-the-documentation

Link to the pandas make.py source: https://github.com/pandas-dev/pandas/blob/b8a4691647a8850d681409c5dd35a12726cd94a1/doc/make.py

This sounds like a very good idea. If we can split the build process, we can split the GitHub Action process. Then, we could run them in parallel to save time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Low-impact maintenance activity
Projects
None yet
Development

No branches or pull requests

5 participants