Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF conversion is slow when CJK font is used #2120

Open
martijnbrinkers opened this issue Apr 9, 2024 · 4 comments
Open

PDF conversion is slow when CJK font is used #2120

martijnbrinkers opened this issue Apr 9, 2024 · 4 comments
Labels
performance Too slow renderings

Comments

@martijnbrinkers
Copy link

If a CJK font is used, PDF conversion is slow.

Steps to reproduce (on AlmaLinux 9 but other Linux systems with a similar same font can be used as well)

Install font

$ sudo dnf install google-noto-sans-cjk-ttc-fonts

install weasyprint

$ python3 -m venv weasyprint
$ cd weasyprint
$ source ./bin/activate
$ pip install weasyprint

$ weasyprint --version
WeasyPrint version 61.2

create html with Chinese characters

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
&#27231;&#27083;&#20661;&#21048;&#30332;&#34892;&#35336;&#21123;&#19979;&#25919;&#24220;&#20661;&#21048;&#30340;&#20729;&#26684;&#21450;&#25910;&#30410;&#29575;
</body>
</html>

convert html to pdf with font installed

$ time weasyprint test.html test.pdf

real    0m2.920s
user    0m2.757s
sys     0m0.157s

now remove font

$ sudo dnf remove google-noto-sans-cjk-ttc-fonts

convert html to pdf without font installed

$ time weasyprint test.html test.pdf

real    0m0.480s
user    0m0.391s
sys     0m0.093s

For a simple html it takes about 6 times longer to convert to a PDF if a CJK font is used.

Is there a way to improve this?

@liZe
Copy link
Member

liZe commented Apr 9, 2024

Hi!

This extra time is used to remove the unused characters from the font. As Noto CJK is huge (~20MB on my computer) and includes a lot of characters, it can take a lot of time.

Is there a way to improve this?

Most of the time is spent in fonttools, and its subsetter has already been reported to be "slow" in fonttools/fonttools#2147. As proposed in that issue, we could try to use hb-subset instead.

@liZe liZe added the performance Too slow renderings label Apr 9, 2024
@martijnbrinkers
Copy link
Author

If I switch to an older release (52.2) conversion is much faster. 0.8 sec (version 52.2) vs 2.9 sec (version 61.2).

@liZe
Copy link
Member

liZe commented Apr 9, 2024

If I switch to an older release (52.2) conversion is much faster. 0.8 sec (version 52.2) vs 2.9 sec (version 61.2).

That’s because font subsetting was done by Cairo, that we don’t use anymore. We’d probably get equivalent performance by using hb-subset.

liZe added a commit that referenced this issue Apr 30, 2024
liZe added a commit that referenced this issue Apr 30, 2024
liZe added a commit that referenced this issue Apr 30, 2024
@liZe
Copy link
Member

liZe commented Apr 30, 2024

We’ve open a highly experimental branch that uses Harfbuzz to subset fonts, and that you can try: hb-subset. If your version of harfbuzz is recent enough (4.1 I think) and you have the harfbuzz-subset library installed (it’s sometimes provided by the harfbuzz package, but it’s sometimes in a separate package, depending on the distribution), it should be much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Too slow renderings
Projects
None yet
Development

No branches or pull requests

2 participants