Dereference error with a particular (corrupt?) PDF #859

cjpartridgeb · 2024-04-25T03:31:20Z

I've recently been running benchmarks on pdfcpu and other PDF tools to try and modernize some of our PDF processes and run across this bug on with pdfcpu (ghostscript and others seem to process the file fine):

dereferenceAndLoad: problem dereferencing object 380: pdfcpu: pdfFilterPipeline: expected decodeParms array corrupt

Here's the full output with -vv:

`
<<<
<X0, (380 0 R)>

READ: 2024/04/25 12:54:48 logStream: no ObjectStreamDict
READ: 2024/04/25 12:54:48 dereferenceObject: begin, dereferencing object 380
READ: 2024/04/25 12:54:48 in use object 380
READ: 2024/04/25 12:54:48 dereferenceAndLoad: dereferencing object 380
READ: 2024/04/25 12:54:48 ParseObject: begin, obj#380, offset:913452
READ: 2024/04/25 12:54:48 newPositionedReader: positioned to offset: 913452
READ: 2024/04/25 12:54:48 buffer: endInd=-1 streamInd=168
READ: 2024/04/25 12:54:48 object: big stream, we parse object until stream
READ: 2024/04/25 12:54:48 pdfFilterPipeline: begin
READ: 2024/04/25 12:54:48 dereferencedObject: dereferencing object 382
READ: 2024/04/25 12:54:48 ParseObject: begin, obj#382, offset:1236490
READ: 2024/04/25 12:54:48 newPositionedReader: positioned to offset: 1236490
READ: 2024/04/25 12:54:48 object: small obj w/o stream, parse until endobj
Fatal: pdfcpu: pdfFilterPipeline: expected decodeParms array corrupt
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.pdfFilterPipeline
`

Tested on latest commit as of yesterday, and also with the version 0.8.0 build that I see released a few hours ago
All of my testing is on Linux, 64bit, various distros
I can't provide the source PDF, due to confidentiality reasons

I have managed to download the source, and had a tinker with building a fix - which I've done by no longer throwing an error when it fails to parse this particular dictionaries contents. This then caused another error later down the pipeline - to which we implemented another fix, to again not throw an error when the dictionary was not available.

This seems to work fine, and the custom built binary now processes the document without error, output PDF appears correct.

I will shortly submit a PR with the changes I've made, but please note that I'm a Go newbie and not sure if my changes may have any other ramifications.

The text was updated successfully, but these errors were encountered:

hhrutter · 2024-05-03T10:11:58Z

Please submit a testfile going along with your patch.
Thank you!

cjpartridgeb added the investigate label Apr 25, 2024

cjpartridgeb assigned hhrutter Apr 25, 2024

cjpartridgeb linked a pull request Apr 25, 2024 that will close this issue

Ignore errors when parsing optional array of decode parameter dicts #860

Open

cjpartridgeb changed the title ~~Deference error with a particular (corrupt?) PDF~~ Dereference error with a particular (corrupt?) PDF Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dereference error with a particular (corrupt?) PDF #859

Dereference error with a particular (corrupt?) PDF #859

cjpartridgeb commented Apr 25, 2024 •

edited

hhrutter commented May 3, 2024

Dereference error with a particular (corrupt?) PDF #859

Dereference error with a particular (corrupt?) PDF #859

Comments

cjpartridgeb commented Apr 25, 2024 • edited

hhrutter commented May 3, 2024

cjpartridgeb commented Apr 25, 2024 •

edited