X4 Produktdokumentation

PDF to Text Converter

This adapter reads a PDF document (also password-protected), extracts the text content of a specific page range or the entire document, and outputs the extracted strings in an XML or text document with a freely selectable character encoding.

Properties

Operation

Describes which operation the adapter performs.

Possible values: Extract: Extract text from the input PDF document

Parameters

password

Password (for a protected PDF document)

Possible values: Any string

startPage

First page number from which the texts are to be extracted

Possible values:

  • Any positive integer or 0

  • 0: Start from the first page (default)

endPage

The last page number up to which you want to perform the text extraction

Possible values:

  • Any positive integer or 0

  • 0: Extract text to the last page (default)

encoding

Character encoding of the result document

Possible values: Any valid string (e.g. UTF-8)

force

Also try to extract text on invalid PDF pages

Possible values:

  • yes: Process invalid PDF pages

  • no: Ignore invalid PDF pages (default)

toXML

Output text content in an XML document

Possible values:

  • yes: Output an XML document

  • no: Output a text document (default)

Status values

-1

The operation was successful.

1

The operation failed due to a technical error.