PDF to Text Converter

This adapter reads a PDF document (also password-protected), extracts the text contents of a specific page range or of the entire document and outputs the extracted character strings in an XML or text document with a freely selectable character encoding.

Properties

Operation

Determines which operation the adapter executes

Possible values: Extract: Extract text from the input PDF document.

Parameter

Adapter	Main class of the adapter (do not change!) Possible values: `en.softproject.integration.adapter.pdf.PDF2Text`: Main class (default)
password	Password (for a protected PDF document) Possible values: Any string
startPage	First page number from which the texts are to be extracted Possible values: Any positive integer or `0` `0`: Start from the first page (default)
endPage	Last page number up to which the text extraction is to be carried out Possible values: Any integer or `0` `0`: Extract text to the last page (default)
encoding	Character encoding of the result document Possible values: Any valid character encoding (e.g. `UTF-8`).
force	Try to extract text even on invalid PDF pages Possible values: `yes`: Process invalid PDF pages `no`: Ignore invalid PDF pages (default)
toXML	Output text contents in an XML document Possible values: `yes`: Output XML document `no`: Output text document (default)

Status values

`-1`	The operation was executed successfully.
`1`	The operation failed due to a technical error.