Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dereference error with a particular (corrupt?) PDF #859

Open
cjpartridgeb opened this issue Apr 25, 2024 · 1 comment · May be fixed by #860
Open

Dereference error with a particular (corrupt?) PDF #859

cjpartridgeb opened this issue Apr 25, 2024 · 1 comment · May be fixed by #860
Assignees

Comments

@cjpartridgeb
Copy link

cjpartridgeb commented Apr 25, 2024

I've recently been running benchmarks on pdfcpu and other PDF tools to try and modernize some of our PDF processes and run across this bug on with pdfcpu (ghostscript and others seem to process the file fine):

dereferenceAndLoad: problem dereferencing object 380: pdfcpu: pdfFilterPipeline: expected decodeParms array corrupt

Here's the full output with -vv:

`
<<<
<X0, (380 0 R)>

READ: 2024/04/25 12:54:48 logStream: no ObjectStreamDict
READ: 2024/04/25 12:54:48 dereferenceObject: begin, dereferencing object 380
READ: 2024/04/25 12:54:48 in use object 380
READ: 2024/04/25 12:54:48 dereferenceAndLoad: dereferencing object 380
READ: 2024/04/25 12:54:48 ParseObject: begin, obj#380, offset:913452
READ: 2024/04/25 12:54:48 newPositionedReader: positioned to offset: 913452
READ: 2024/04/25 12:54:48 buffer: endInd=-1 streamInd=168
READ: 2024/04/25 12:54:48 object: big stream, we parse object until stream
READ: 2024/04/25 12:54:48 pdfFilterPipeline: begin
READ: 2024/04/25 12:54:48 dereferencedObject: dereferencing object 382
READ: 2024/04/25 12:54:48 ParseObject: begin, obj#382, offset:1236490
READ: 2024/04/25 12:54:48 newPositionedReader: positioned to offset: 1236490
READ: 2024/04/25 12:54:48 object: small obj w/o stream, parse until endobj
Fatal: pdfcpu: pdfFilterPipeline: expected decodeParms array corrupt
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.pdfFilterPipeline
`

  • Tested on latest commit as of yesterday, and also with the version 0.8.0 build that I see released a few hours ago
  • All of my testing is on Linux, 64bit, various distros
  • I can't provide the source PDF, due to confidentiality reasons

I have managed to download the source, and had a tinker with building a fix - which I've done by no longer throwing an error when it fails to parse this particular dictionaries contents. This then caused another error later down the pipeline - to which we implemented another fix, to again not throw an error when the dictionary was not available.

This seems to work fine, and the custom built binary now processes the document without error, output PDF appears correct.

I will shortly submit a PR with the changes I've made, but please note that I'm a Go newbie and not sure if my changes may have any other ramifications.

@cjpartridgeb cjpartridgeb changed the title Deference error with a particular (corrupt?) PDF Dereference error with a particular (corrupt?) PDF Apr 25, 2024
@hhrutter
Copy link
Collaborator

hhrutter commented May 3, 2024

Please submit a testfile going along with your patch.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants