header image news page

What is PDF? Part 5 - Metadata

This part of the mini-series on PDF will be about metadata. Metadata is not visible in the document when printed, but only useful for curious human beings and some software that needs a PDF in a special format (like electronic invoices in the ZUGFeRD format). Metadata is always “about this document”, so it changes from document to document and must be applied individually.

from Patrick Gundlach |

Release speedata Publisher 4.18

Last week I have released the new stable version 4.18 with some big internal changes, which hopefully don’t affect your documents. Very intensive testing has been done and all test files (currently more than 220 and some bigger production documents) run fine with the new version.

New XML/ XPath parser

This internal change is to make the new XML and XPath parser the default parser, which is a complete rewrite and is more robust than the old one.

from Patrick Gundlach |

What is PDF? Part 4 - interactive features

Interaction

PDF allows a lot of different interactive features such as web hyperlinks, jumping to a different location in the document, notes, video playback, JavaScript programs and many more. In this part I cover some of the basic features (bookmarks and annotations).

from Patrick Gundlach |

What is PDF? Part 3 – Vector graphis

Vector graphics

In the third part I cover vector graphics. You can also include PNG and JPEG images in the PDF, which will be covered in a later part of the PDF introduction.

from Patrick Gundlach |

What is PDF? Part 2 – Fonts

This is part 2 of a mini-series on PDF.

Part 1 – PDF syntax and file structure
Part 2 – Fonts
Part 3 - Vector graphics
Part 4 - Interactive features
Part 5 - Metadata

Please note that all of these examples are created manually. If you wish to experiment with the examples, you can do so yourself. For more information, visit https://github.com/speedata/fixxref which provides a small program that supports manual PDF editing.

In the previous article in this series, I introduced the basic structure of a PDF file and how to create a PDF file using a text editor.

My goal in this post is to add some text to the PDF (using the included fonts). I should mention that you can find all the details in the PDF specification. There is one for 1.7 (recommended, very readable) and for 2.0 (register to download, few PDF viewers support 2.0 at the time of writing).

Writing text

For this introduction, I don’t want to complicate things. So I will use one of the PDF viewer’s built-in fonts, a so-called “standard 14” font. These are Courier, Courier-Bold, Courier-BoldOblique, Courier-Oblique, Helvetica, Helvetica-Bold, Helvetica-BoldOblique, Helvetica-Oblique, Symbol, Times-Bold, Times-BoldItalic, Times-Italic, Times-Roman, ZapfDingbats.

from Patrick Gundlach |

Debugging PDF files

While developing the speedata Publisher, I have to create PDF instructions to draw shapes, create accessibility data structures and embed files for example. For boxes and glue, I have to create a PDF file from scratch. But once in a while I make mistakes and the PDF file cannot be displayed in the viewer. Then I need to look into the PDF file and check manually where the problem is. For example Adobe Acrobat shows a message:

from Patrick Gundlach |

Page shuffling

Yesterday I came across a Reddit question:

“I have a pdf where the pages somehow got warped into page 2 then 1 then 4 then 3 then 6 then 5…etc Everything is right except the even pages are a step ahead of the odds.”

This is very easy to do with the speedata Publisher:

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">
    <Record element="data">
        <SetVariable variable="fn" select="'fivepages.pdf'" />
        <SetVariable variable="cp" select="sd:number-of-pages($fn)" />
        <Loop select="$cp div 2 " variable="i">
            <PlaceObject row="0mm" column="0mm">
                <Image file="{$fn}" page="{$i * 2}" />
            </PlaceObject>
            <ClearPage />
            <PlaceObject row="0mm" column="0mm">
                <Image file="{$fn}" page="{$i * 2 - 1}" />
            </PlaceObject>
            <ClearPage />
        </Loop>
        <Switch>
            <Case test="sd:odd($cp)">
                <PlaceObject row="0mm" column="0mm">
                    <Image file="{$fn}" page="{$cp}" />
                </PlaceObject>
            </Case>
        </Switch>
    </Record>
</Layout>
from Patrick Gundlach |

Reduce PDF file size

The speedata Publisher has a new (pro) feature to reduce the file size of the resulting PDF. This works by setting a maximum DPI value for bitmap images (PNG and JPG).

from Patrick Gundlach |

Markdown and a layout-quine

The speedata Publisher has now (version 4.17.11) basic support for markdown, an easy to use markup language.

As an example of markdown formatting, this snippet creates a level 1 heading and a simple bullet list:

# A title

* one
* anotherone
* three

Using markdown with the speedata Publisher is very easy. There is a new layout function called sd:markdown()

from Patrick Gundlach |