What Is a Portable Document Format?
The portable document format ( English: Portable Document Format , PDF for short) is a file format that presents documents in a manner independent of applications, hardware, and operating systems. Each PDF file contains a complete description of a flat document with a fixed layout, including text, glyphs, graphics, and other information that needs to be displayed. In 1991, a system called "Camelot" proposed by John Warnock, co-founder of Adobe Systems, evolved into PDF.
- Derived from
Portable Document Format File Structure
- PDF files are a subset of the COS ("Carousel" Object Structure) format. COS is also accompanied by FDF files. COS tree files are mainly composed of objects, of which there are eight types:
- Boolean, true or false
- digital
- String
- name
- Array, the prescriptive collection of objects
- Dictionary, a collection of objects indexed by name
- String stream, usually containing large amounts of data
- Empty object
- Objects can be direct (embedded in other objects) or indirect. Direct objects are numbered by object number and code number. An index table called the xref table gives each object a byte offset from the beginning of the file. This design allows efficient random access to objects in the file and also allows small changes to be made without having to rewrite the entire file (incremental updates). As of PDF version 1.5, indirect objects can also be located in a stream of strings called an "object stream". This technique of increasing file size has a large number of small indirect objects and is especially useful for marking PDFs.
- PDF files have two output methods: non-linear (non- "optimized") and linear ("optimized"). Non-linear PDF files take up less hard disk space than linear ones, even if part of the data requires that the collection pages in the document be scattered across the PDF file and accessed more slowly. Linear PDF files (also known as "optimized" or "web-optimized" PDF files) are constructed in such a way that web browser plug-ins don't have to wait to download the entire file to read, since they were written to the hard disk in a linear fashion (as page order). PDF files can be optimized with Adobe Acrobat software or QPDF. [2]
Portable Document Format Image Mode
- The basic design of graphics rendered in PDF is very similar to that in PostScript, except for the use of transparency added in PDF 1.4.
- PDF graphics use a device-independent Cartesian coordinate system to describe the appearance of the page. A PDF page description can use a matrix to scale, rotate, or skew the graphic elements. A major concept in PDF is graphics state, which is a collection of graphics parameters that may be modified, saved, or restored through page descriptions. PDF has (as in version 1.6) 24 graphic state attributes, the most important of which are:
- The current transformation matrix (CTM) determines the coordinate system
- Clipping path
- Color space
- Alpha composite, a key part of transparency [2]
Portable document format vector
- Vector graphics in PDF are composed of paths just like in PostScript. Paths usually consist of straight lines and polynomial Bezier curves, but they can also be constructed from the outline of text. Unlike PostScript, PDF does not allow a single path with straight lines and curves to combine text outlines. Paths can be drawn, filled, or used for clipping. Paint and fill can be used in any set of colors, including styles.
- PDF supports multiple palette types. The simplest is a tile pattern in which a part of a work of art is designated to be drawn repeatedly. This may be a colored tile pattern with a specified color in the pattern object, or a delayed color code to a tile pattern without color when the pattern has been drawn. Since PDF 1.3, there are also shading patterns, which draw different colors continuously. The simplest of the seven shading patterns are axial shading (Type 2) and radial shading (Type 3). [2]
Portable Document Format Bitmap
- Bitmaps in PDFs (called Image XObjects) are rendered by a dictionary with a stream of related strings. The dictionary describes the attributes of the image and the stream containing the image data. (Rarely, a bitmap may be embedded directly into the page description as an embedded image.) Images are usually filtered for compression purposes. Image filters supported in PDF include commonly used filters
- ASCII85Decode filter for putting a string stream into 7-bit ASCII
- ASCIIHexDecode is similar to ASCII85Decode, but with low compatibility
- FlateDecode is a common filter based on the zlib / deflate algorithm (ie, gzip, but not zip) defined by RFC 1950 and RFC 1951; it was introduced in PDF 1.2; one of two sets of prediction functions is available for further compatibility with the zlib / deflate algorithm: Predictor 2 from the TIFF 6.0 specification and predictors (filters) from the PNG specification (RFC 2083)
- LZWDecode filters based on the LZW algorithm are further compatible with LZW compression using one of two sets of prediction functions: Predictor 2 from the TIFF 6.0 specification and predictors (filters) from the PNG specification
- RunLengthDecode a simple compression algorithm for string streams with repeated data using run- length encoding algorithms and image-specific filters
- DCTDecode lossy filter based on JPEG standard
- CCITTFaxDecode is a lossless binary (black and white) filter based on the Group 3 or Group 4 CCITT (ITU-T) fax compression standard defined in ITU-TT.4 and T.6
- JBIG2Decode is a lossy or lossless binary (black and white) filter based on the JBIG2 standard, introducing PDF 1.4
- JPXDecode is a lossy or lossless filter based on the JPEG 2000 standard, introducing PDF 1.5
- Usually all images contained in PDFs are embedded in images, but PDF allows image data to be stored in external files by using an external string stream or alternative images. A standard subset of PDF, including PDF / A and PDF / X, prohibits these features. [2]
Portable Document Format Text
- Text is rendered in the PDF as "text elements" in a stream of page content strings. A text element specifies that the character should be drawn at the specified position. Characters are specified with the encoding of the selected font source. [2]
Portable Document Format Font
- A font object in a PDF is a description of a digital font. It may be a description of the characters in the font, or it may contain an embedded font file. The latter is called embedded font and the former is called non-embedded font. The embedded font files are based on widely used standard digital font files: Type 1 (and its compressed variant CFF ), TrueType, and (as of PDF version 1.6) OpenType . In addition, PDF supports Type 3 variants of the font components described by the PDF graphics processor. [2]
Portable Document Format Encoding
- In text strings, characters are used to display the character code (integer) that maps glyphs to the current font. There are many predefined encodings, including WinAnsi, MacRoman, and a large number of East Asian language encodings, and fonts can have their own encoding. (Even if the WinAnsi and MacRoman encodings are taken from the historical proprietary encodings of the Windows and Macintosh operating systems, the contents of such encodings work well on any platform.) PDF can specify predefined encodings that can be used, and built-in fonts Encoding, or a lookup table that provides a predefined or built-in encoding (not recommended for TrueType fonts). The encoding mechanism in PDF is designed for Type 1 fonts, and the rules applied to TrueType fonts are composite.
- For large fonts or fonts with non-standard glyphs, the special encoding Identity-H (for horizontal writing) or Identity-V (for vertical writing) is used. If the semantic information about the characters is predefined, it is necessary for such fonts to provide a ToUnicode table. [2]
Portable Document Format Transparency
- PDF's original image model is like PostScript's opaque: every object depicted on the page completely replaces anything previously marked in the same location. The image model was extended in PDF 1.4 to allow transparency. When using transparency, new objects interact with previously marked objects to produce mixed effects. Adding transparency to PDF is done by a new extension to the design that was ignored in products written to PDF 1.3 and earlier specifications. As a result, files with a small amount of transparency may be viewed in an accepted view in the old viewer, but files with a large amount of transparency may be displayed incorrectly without warning in the old viewer.
- Transparency extensions are based on key concepts of transparency groups, blend modes, shapes, and alpha. This mode closely corresponds to the characteristics of Adobe Illustrator9. The blending mode was based on that used by Adobe Photoshop at the time. When the PDF 1.4 specification was published, the formula used to calculate the mixed mode was kept confidential by Adobe. They have since been announced.
- The concept of transparency groups in the PDF specification is independent of the existing concept of "groups" or "layers" in applications such as Adobe Illustrator. Groupings that reflect the logical relationships above the objects make sense when editing those objects, but are not part of the image model. [2]
Portable Document Format Interactive Elements
- PDF files may contain interactive elements such as annotations, forms, videos, and Flash animations.
- Rich Media PDF is a term used to describe interactive content that can be embedded or linked into a PDF. This content must be provided in a Flash file format. When Adobe acquired Macromedia, the company's main business was Flash, and the Flash player was embedded in Adobe Acrobat, Adobe Reader, removing the need for third-party plug-ins such as Flash, QuickTime, or Windows Media. Unfortunately, this caused QuickTime videos to be banned from PDFs in such a way as to crack with Apple. Rich media expert Robert Connolly believes the incident triggered a conflict over the Flash iPhone / iPad dispute between Apple and Adobe. Rich media PDFs will not operate on iOS devices such as Apple's iPad, and interactivity will be limited.
- Interactive forms are a mechanism for adding forms to the PDF file format.
- PDF currently supports two different approaches for integrating data and PDF forms. Both formats coexist in the PDF specification today:
- AcroForms (aka Acrobat forms ) introduced the PDF 1.2 format specification and was included in all subsequent PDF specifications.
- Adobe XML Forms Architecture (XFA) forms, introduced in PDF 1.5 format specification. The XFA specification is not included in the PDF specification and is only a reference for optional features. Adobe XFA forms are not compatible with AcroForms. [2]
AcroForms Portable Document Format AcroForms
- AcroForms was introduced in PDF 1.2 format. AcroForms allows the use of objects (such as text boxes, select buttons, etc.) and some code (such as JavaScript).
- In addition to the standard PDF action types, AcroForms supports submitting, resetting, and importing data. The Submit action passes the name of the selected form field and value to the specified uniform resource identifier (URL). Interactive form field names and values may be submitted in any format (depending on the output format, the submitted PDF, and the setting of the XFDF flag):
- HTML form format (HTML 4.01 specification since PDF 1.5; HTML 2.0 since 1.2)
- Forms Data Format (FDF)
- XML Forms Data Format (XFDF) (Extended XFDF specification, version 2.0; supported since PDF 1.5; replaces XML form submission format definition in PDF 1.4)
- PDF (the entire document can be submitted instead of individual fields and values). (Defined in PDF 1.4)
- AcroForms can keep form fields in separate files that contain key: value combinations. Internal files may use FDF and XFDF files. The right to use (UR) signature defines the right to import form data files in FDF, XFDF and text (CSV / TSV) formats, and export files from data files in FDF and XFDF formats. [2]
FDF Portable Document Format Form Data Format (FDF)
- The Forms Data Format (FDF) is based on PDF, uses the same syntax and basically the same file structure, but is simpler than PDF. Since the body of an FDF document consists of only one required object. The form data format is defined in the PDF specification (since PDF 1.2). The form data format can be used when the form data is sent to the server, received in response, and combined into an interactive form. It can also be used to export form data to a separate file that can be exported back to the corresponding PDF interactive form. As of PDF 1.3, FDF can be used to define a container for comments that are separated from the applied PDF document. FDF typically encapsulates information such as X.509 certificates, requires certificates, sets directories, sets time stamp servers, and embeds PDF files for network transmission. FDF uses the MIME content type application / vnd.fdf, the file extension .fdf, and uses the file type 'FDF' on Mac OS. Support for importing and exporting standalone FDF files is not widely implemented by free or free PDF software. For example, Evince, Okular, Popper, KPDF, or Sumatra PDF does not have import / export support. However, Evince, Okular, and Popller fill in PDF Acroforms and save the filled data in the PDF. Support for importing stand-alone FDF files is implemented in Adobe Reader; import and export support (including saving FDF data in PDF) is implemented as examples in Foxit Reader and PDF-XChange Viewer Free; saving of FDF data in PDF files is also implemented pdftk support. [2]
Adobe XMLXFA Portable Document Format Adobe XML Form Schema (XFA)
- In PDF 1.5 format, Adobe Systems introduced a new, proprietary form format called Adobe XML Forms Schema (XFA). XFA 2.02 is referenced in the PDF 1.5 specification (and later), but is described separately as the "Adobe XML Forms Architecture (XFA) Specification", with multiple versions. The XFA specification is not included in ISO 32000-1 PDF 1.7 and is only cited as an external proprietary specification created by Adobe. Deprecated in ISO 32000-2 (PDF 2.0).
- Adobe XFA forms are not compatible with AcroForms. Adobe Reader contains "Disabled Features" using XFA forms, which are only activated when opening PDF documents created with workable technologies from Adobe only. XFA Forms is not compatible with versions prior to Adobe Reader 6.
- XFA forms can be created or used as PDF files or as XDP (XML Data Package) files. The format of the XFA source in the PDF is described by the XML packet specification. XDP may be a separate document or it may be carried inside a PDF document. XDP provides a mechanism for packaging form components inside the surrounding XML container. XDP can also package a PDF file with XML forms and template data. PDF may contain XFA (in XDP format), and XFA may contain PDF. When XFA (XML Form Schema) syntax is used to move from one application to another, they must be encapsulated in XML packets.
- When PDF and XFA are combined, the result is that the XFA form on any page covers the PDF background. This architecture is sometimes referred to as XFAF (XFA Foreground). The alternative is to expand all forms, including boilerplate files, directly in XFA (without PDF, or "plug-in PDFs" with minimally structured XFA containers with PDF markup, or using pre-rendered descriptions of static XFA forms as PDFs). This is sometimes called full XFA.
- Starting with PDF 1.5, the text content of variable text fields, and markup annotations may contain formatting information (style information). These rich text strings are XML documents that match the rich text conventions defined by the XML Form Schema Specification 2.02 (which is itself a subset of the XHTML 1.0 specification), extending the restricted settings in CSS2 style attributes. In PDF 1.6, PDF supports rich text elements and attributes as defined in XML Form Schema (XFA) Specification 2.2. In PDF 1.7, PDF supports rich text elements and attributes as defined in XML Form Schema (XFA) Specification 2.4.
- Most PDF processors do not process XFA content. A simple one-page PDF image that is suggested to be included in the PDF markup when generating a plug-in PDF is displayed with a warning (eg: "To view the entire contents of this document, you need a new PDF viewer", etc.) A PDF that can render XFA content The processor should not alert the page image or replace it quickly with dynamic form content. Examples of PDF software with some XFA rendering support include Adobe Reader for Windows, Linux, Mac OS X (but not Adobe Reader Mobile for Android / iOS) or Nuance PDF Reader. [2]
Portable document format logical architecture and readability
- "Markup" PDFs (ISO 32000-1: 2008 14.8) contain document architecture and semantic information for reliable text extraction and access. Technically, markup PDF is a stylized use of the format created on the framework of a logical architecture, introduced to PDF 1.3. Tagged PDF defines a standard set of structure types and gives attributes that allow page content (text, graphics, and pictures) to be extracted and reused for other purposes.
- Marking PDF does not require a PDF file for printing. Since this feature is optional and the rules for marking PDFs specified by ISO 32000-1 are relatively vague, support for marking PDFs in consumer devices, including assistive technology (AT), has been uneven.
- The AIIM project to develop a standardized subset of ISO for readable PDF specifications started in 2004 and eventually became PDF / UA. [2]
Portable Document Format Security and Signing
- A PDF file may be encrypted with a guarantee for verification or a digital signature.
- The standard guarantee provided by Acrobat PDF consists of two different methods and two different passwords, the user password, which encrypts the file and prevents it from being opened; the owner password, which specifies operations that should be restricted even when the document is decrypted, and can include: Print, copy text and images from documents, retouch documents, or add or delete text comments and AcroForm fields. The user password (control on) encrypts the file and requires password cracking to remove it. The difficulty depends on the length of the password and the encryption algorithm-it may be very secure (assuming good passwords and encryption algorithms have no known attack methods). The owner password (control action) does not encrypt the file, but instead depends on the client software to comply with these restrictions and is not secure. The owner password can be removed by many commonly available PDF cracking software, including some free online services. As a result, document authors' use restrictions placed in PDF documents are unsafe and cannot be guaranteed once the files are distributed; this danger becomes apparent when using Adobe Acrobat software to create or edit PDF files with such restrictions.
- Even without removing the password, many free or open source PDF readers ignore the permission "protection" and allow users to print or copy text summaries as if the document was not restricted by password protection.
- Some solutions, such as Adobe's LiveCycle Rights Management, strengthen the information rights management method, which not only restricts who can open the document, but also reliably executes permissions in a way that standard security processes cannot. [2]
Right to use portable document format
- As of PDF 1.5, Right of Use (UR) signatures are used to enable additional interactive features that are not the default in some PDF viewer applications. This signature is used to verify the permission granted by a real authority. For example, it allows users to:
- Save PDF documents with revised form and / or annotation data
- Import from data files in FDF, XFDF and text (CSV / TSV) formats
- Export from data files in FDF and XFDF formats
- Submit from data
- Render a new page from a named page template
- Apply a digital signature to an existing digitally signed form field
- Create, delete, modify, copy, import, export comments
- For example, Adobe Systems has licensed additional features in Adobe Reader to use public key cryptography. Adobe Reader uses a certificate from an Adobe-authorized authority to verify that signature. The PDF 1.5 specification claims that other PDF reader applications are free to use this same mechanism for their own purposes. [2]
Portable Document Format File Attachment
- PDF files can have file-level and page-level file attachments, which readers can access and open or store locally in the file system. The PDF attachment can be used as an example to add pdftk to an existing PDF file. Adobe Reader provides support for attachments, and Popper-based readers such as Evince or Okular also support document-level attachments. [2]
Portable Document Format Metadata
- PDF files can contain two types of metadata. The first is a document information dictionary, a set of keyword / value fields like author, title, subject, creation and update date. This is stored at the end of the optional file. A small set of fields are defined and can be augmented with additional text fields if needed.
- Later in PDF 1.4, support for metadata streams was added, using the Extensible Metadata Platform (XMP) to add extensible metadata based on XML standards like in other file formats. This allows metadata to be appended to any string stream in the document, such as descriptions about embedded artwork, and the entire document (attached to the document directory), using an extensible schema. [2]
- The PDF file format was developed in the early 1990s. It is used to share documents including text format and built-in video. It can be operated across platforms. Even if the computer platform is completely different, the recipient does not need to adapt the related or shared
- PDF consists of three technologies: