# Tutorial: Intelligent Document Recognizer

This page describes the steps necessary to recognize and convert PDF, JPEG, PNG or TIFF purchase invoices to an e-invoice with the Intelligent Document Recognizer (IDR), using the RecognizePurchaseInvoice endpoint. The tutorial also applies to sales invoices. The only difference is the use of corresponding SalesInvoice endpoints and topics instead.

# Set up hooks

To start recognizing purchase invoices you first need to subscribe to the "RecognizePurchaseInvoice" topic via the Subscribe endpoint. More information about hooks can be found here.

The hook's action consists of multiple parts: the IDR quality, priority and features you use as well as party details.

# IDR quality

The IDR quality is an optional parameter to specify the quality of conversion needed. The default quality is 'Default' which will do a standard quality conversion. If you need higher-quality conversions or enable low-quality conversions you can use this parameter. Possible conversion modes are:

  • default Default mode which uses standard quality conversions. Not necessary to be submitted in the hook's action.
  • hq Use high quality conversions which will use a higher confidence level to pass.
  • lq Allow low-quality conversions to pass.

# IDR priority

This affects the placement of conversion queues. The available priorities are:

  • high High priority conversions will always be processed/verified first in the order of arrival.
  • medium The medium priority queue will be processed once all high priority conversions are completed.
  • low Low priority conversions will be processed when no high and medium priority conversions are available.

# Features

The IDR offers some additional features to enhance the way documents are converted. To enable these features, add them to the features query parameter in the hook's action as a comma-separated string. The features are:

  • iban This is an optional conversion features parameter. Including this parameter will extract IBAN number, confidence depending on quality parameter.
  • g-account This is an optional parameter (IBAN inclusive). Including this parameter will force the use of the g-account feature (EN 16931 extension for UBL). The g-account splits in invoice will be done only if the document has relevant signal keywords like G-rekening or G-account.
  • order-reference This is an optional conversion features parameter. Including this parameter will extract order number. The confidence will depend on the quality set*.
  • project-reference This is an optional conversion features parameter. Including this parameter will extract project number. The confidence will depend on the quality set*.
  • contract-reference This is an optional conversion features parameter. Including this parameter will extract contract number. The confidence will depend on the quality set*.

*The quality set is maintained by eConnect. To add or remove values you can contact support (opens new window).

# Hook action

The action should be formatted like: "action":"recognize://idr?quality={quality}&priority={priority}&features={features}&data={PartyDetails}". The party details should contain a list of names you might use on your invoices, all identifiers your company uses (e.g. KVK and OINO), your emailaddress and the addresses that can be found on your invoices. These details should be combined into one JSON object. If the recognize hook is only used for purchase invoices it's also possible to use an array of multiple party details JSON objects. This is not supported for sales invoices. The party details object looks like this:

{
    "names": [
        "eVerbinding",
        "eConnect"
    ],
    "identifiers": [
        "0106:123",
        "0190:456"
    ],
    "emailAddress": "techsupport@econnect.eu",
    "addresses": [
        {           
            "street": "Pelmolenlaan 16A",
            "postalZone": "3447GW",
            "city": "Woerden",
            "country": "NL"
        }
    ]
}

These party details must be encoded as a base64 string and should be added to the data parameter in the hook's action. For example:

{
    "id": "1",
    "name": "idr hook",
    "action":"recognize://idr?quality=hq&priority=high&features=iban,g-account,order-reference&data=ew0KICAgICJuYW1lcyI6IFsNCiAgICAgICAgImVWZXJiaW5kaW5nIiwNCiAgICAgICAgImVDb25uZWN0Ig0KICAgIF0sDQogICAgImlkZW50aWZpZXJzIjogWw0KICAgICAgICAiMDEwNjoxMjMiLA0KICAgICAgICAiMDE5MDo0NTYiDQogICAgXSwNCiAgICAiZW1haWxBZGRyZXNzIjogInRlY2hzdXBwb3J0QGVjb25uZWN0LmV1IiwNCiAgICAiYWRkcmVzc2VzIjogWw0KICAgICAgICB7ICAgICAgICAgICANCiAgICAgICAgICAgICJzdHJlZXQiOiAiSG91dHR1aW5sYWFuIDQiLA0KICAgICAgICAgICAgInBvc3RhbFpvbmUiOiAiMzQ0N0dNIiwNCiAgICAgICAgICAgICJjaXR5IjogIldvZXJkZW4iLA0KICAgICAgICAgICAgImNvdW50cnkiOiAiTkwiDQogICAgICAgIH0NCiAgICBdDQp9",
    "topics": [     
      "RecognizePurchaseInvoice"
    ],  
    "isActive": true,
    "createdOn": "2021-01-26T20:40:31"
}

# Callback hook

To know when the recognition process is finished a second hook is required that listens to the callback topics. This can either be a mail hook, or a webhook. The callback hooks are:

Topic Function
PurchaseInvoiceRecognized Triggered when a document is successfully recognized by the IDR.
PurchaseInvoiceRecognizedPending Triggered when a document is waiting for quality control. The hook's message will contain more information about the verification(s) that are pending.
PurchaseInvoiceRecognizedRejected Triggered when a document is rejected after quality control. The hook's message will contain more information about why the document was rejected.
PurchaseInvoiceRecognizedError Triggered when a document is rejected by the IDR because of an error. The hook's message will contain more information about what went wrong.

A mail hook example looks like this:

{
    "id": "2",
    "action": "mailto:techsupport@econnect.eu",
    "name": "mail hook",
    "topics": [
     "PurchaseInvoiceRecognized",
     "PurchaseInvoiceRecognizedPending",
     "PurchaseInvoiceRecognizedRejected",
     "PurchaseInvoiceRecognizedError"
    ],
    "isActive": true
}

# Possible error codes

  • [IDR422.IDR_OCR_INVALID_DOC_CONTENT] Input file {documentid}.pdf was not processed of a format error. This error is thrown when a password protected or corrupted PDF is uploaded for conversion.
  • [IDR422.IDR_422_INVALID_FILE_CONTENT] Invalid PDF Content. This error is thrown when a file other then PDF is uploaded for conversion.

# Upload document

After the hooks are set up PDF purchase invoices can be uploaded to be recognized. Use the POST /api/v1/{partyId}/purchaseInvoice/recognize endpoint to upload documents. The {partyId} should be the receiver's partyId. The document to recognize should be added to the request body as a binary file with content type multipart/form-data.

The response will contain a documentId, which which will be used as reference in the upcoming events to your subscriber hooks.

# Download the result

When the recognition process is complete and the document is converted, a "PurchaseInvoiceRecognized" hook will trigger. In this hook you'll find the same documentId we saw earlier as the response of the upload request. With this documentId you can call GET /api/v1/{partyId}/purchaseInvoice/{documentId}/download to download the UBL invoice. The partyId should be the same you used to upload the PDF.

© 2024 eConnect International B.V.