Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Some images fail to load (?) #146

Open
andrewang0001 opened this issue Dec 6, 2022 · 2 comments
Open

BUG Some images fail to load (?) #146

andrewang0001 opened this issue Dec 6, 2022 · 2 comments

Comments

@andrewang0001
Copy link

andrewang0001 commented Dec 6, 2022

Describe the bug
On loading and immediately dumping certain PDFs, images are lost. I am unsure whether it is because they have failed to load or whether they have failed to dump. I haven't yet figured out what is in common with these PDFs.
Of note, sumatrapdf cannot render PDFs that were produced this way (i.e. loading and dumping at all. Though the Firefox PDF reader does, it loses the images. I have not investigated whether other readers can render these.

To Reproduce
A file where this has been produced: fleur-dining-menu-210220.pdf

from borb.pdf import PDF
from borb.toolkit import ImageExtraction

bad_file = "fleur-dining-menu-210220.pdf"
exportname = 'fleur_export.pdf'
def main():
    l : ImageExtraction = ImageExtraction()
    
    with open(bad_file, 'rb') as f:
        pdf = PDF.loads(f, [l])
        
    print(l.extract_images()[0]) # returns a single image, the background. 
    # I wonder if the logo should be printed here?

    with open(exportname, 'wb') as f:
        PDF.dumps(f, pdf) # the logo 'fleur' is lost

if __name__ == "__main__":
    main()

Expected behaviour
The same PDF should be reproduced after loading it and dumping it.

Screenshots
Left - original; Right - after loading and dumping using borb.
Sumatrapdf would not render the PDF on the right; firefox was used.

Screenshot 2022-12-06 202152

Desktop (please complete the following information):

I imagine that I'm missing or doing something wildly incorrect! Please correct me if so.

@jorisschellekens
Copy link
Owner

You are not doing anything wrong.
However canva, the producer of this file is.
There is a validator for PDF files online. You can find it here.

When I run it against your input PDF, it provides the following errors (taking only those related to colors and images):

  • The following keys, if present in an ExtGState object, shall have the values shown: ca - 1.0 | Failed: 1 occurrences
  • DeviceRGB may be used only if the file has a PDF/A-1 OutputIntent that uses an RGB colour space | Failed: 97 occurrences

The second one is not as severe as it sounds. Essentially, in order to be a PDF/A (archiveable) document, you need to embed a color profile (such that readers can calibrate themselves). This is only a requirement for archiveable PDF documents.

The first warning however, I have not yet seen that one before.
I'll have a look.

Interesting problem! Thank you!

@andrewang0001
Copy link
Author

andrewang0001 commented Dec 7, 2022

There is a validator for PDF files online. You can find it here.

Thanks for sharing this.

I have run it against other PDFs that have the same graphical issue. Of note, this PDF does not have the same validation errors as the input PDF as the fleur menu, but it does show the same 'missing graphic' issue.

As before, left is the original, and right is the result after loading and dumping from borb.

image

This input PDF returned these errors using veraPDF:

  • Properties specified in XMP form shall use either the predefined schemas defined in XMP Specification, or extension schemas that comply with XMP Specification
  • An XObject dictionary shall not contain the SMask key
  • A Group object with an S key with a value of Transparency shall not be included in a form XObject. A Group object with an S key with a value of Transparency shall not be included in a page dictionary
  • The PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema.

I wonder whether there is something else going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants