-
-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grayscale images are not extracted #789
Comments
seeing a second example of exiting without processing the image or warning here: Lines 239 to 241 in 98cb73b
ran into this with a different PDF, with a DeviceN colorspace image:
|
Yeah, that does not surprise me at all, Apple magic.. |
Could you provide a sample for the DeviceN colorspace img issue? 🙏🏻 |
Yes - here's the DeviceN file |
To explain a bit more about the color space of this image - it's "CMYK" + two spot colors, so 6 channels in all ( The case statement here: pdfcpu/pkg/pdfcpu/writeImage.go Lines 775 to 777 in 043541b
probably should have a default with a warning I suspect I will need to make a custom |
I think so.. |
Your first example contains uncompressed images, the latest commit is a fix for this. The second example is tricky, since it involves some postscript processing in order to map The latest commit contains an uncompleted fix in a sense that at least it renders a gray image for your example 2 At some point I need to return to this, right now I am tied up with other issues |
Thanks for handling the uncompressed case. I can tackle the DeviceN colorspaces with more than 4 components in a new PR - I already have some code for this. The tricky part there is going to be that there are multiple output files per |
I will help out with the overall design of this once you have the rendering part working somehow. |
this image parsing code: is properly extracting the 6 grayscale images in the channels of the But clearly the organization of where to write the files needs to change. Ideas on how to do that? My initial thought was |
Awesome! |
How did you figure out the necessary decoding for this?
Your code is working on the assumption, that any DeviceN color space using more than 4 components is a I am unsure if we can commit to this - can we? |
Agree that this can't be committed as written. To get it to a place where we could merge, I think we'd want:
Do ^ those two make sense? As for the encoding, I knew what the gray images should look like, and I tried <x,y,c> ordering options until the outputs looked right. I don't know that there is a spec for these InDesign generated PDFs. |
grayscale images in pdf are not extracted. I think the problem may be that the images don't define a filter and this code:
pdfcpu/pkg/pdfcpu/extract.go
Lines 386 to 388 in 04634d3
is skipping the image without warning.
Low priority issue for me - but thought that the code above should at least generate a warning if skipping images.
Here's the example.pdf. It was generated by the adobe suite, which may be part of the problem.
Interestingly, when I open the pdf in MacOS Preview, edit it (e.g. delete a page) and then save it again - this seems to add filter metadata (and change the color space 🤷 ), which allows the images to be extracted.
The text was updated successfully, but these errors were encountered: