Skip to content
Last updated

DocProcessor is an AI-powered document understanding engine leveraging Large Language Models (LLM) and Vision-Language Models (VLM) to read, interpret, and extract structured data from diverse text-based documents across multiple languages and formats.

Key Value Propositions

  • Flexibility: Choose precisely which data to extract via configuration—no model training required
  • Speed: Integrate new document types quickly without model retraining, reducing time from weeks to days
  • Robustness: High-quality extraction powered by LLM/VLM technology
  • Multilingual: Native support for 5 major European languages
  • Easy Integration: RESTful API for seamless system integration

Technical Specifications

Supported Languages (V1)

DocProcessor supports the following languages for document text:

  • French
  • English
  • German
  • Italian
  • Spanish

Supported Document Types (V1)

Document TypeDescriptionKey Fields Extracted
Bank AccountBank account document containing bank and account holder detailsIBAN, BIC, Bank Name, Account Owner Name
Energy InvoiceSimple energy consumption invoice without payment scheduleAddress, Customer Name, Issue Date
Energy ScheduleMonthly energy invoice payment scheduleAddress, Customer Name, Issue Date
Family AllowanceFamily allowance documentName, Address, Beneficiary Number, Payment Date, Benefit Amount, Family Quotient
Insurance AttestationInsurance attestation document containing personal informationName, Address, Issue Date
PayslipPayslip document containing details about the employer, employee, and paymentEmployer Name, Employer Address, Employee Name, Employee Address, Payslip Date, Net Salary, Gross Monthly Income, Monthly Income, Annual Income, Entry Date, Company Office ID, Code NAF, NIR
Phone InvoicePhone invoice documentName, Address, Issue Date, Phone Number
Provider AttestationSubscription or plan attestation for a residenceName, Address, Issue Date
Retirement PensionRetirement pension containing personal information with addressName, Address, Issue Date
Tax NoticeTax notice document containing fiscal and personal informationTax Year, Fiscal Number 1, Fiscal Number 2, Tax Reference, Address, Date, IBAN, BIC, Taxable Income, Tax Reference Income, Global Gross Income
Vehicle Registration CertificateThe French vehicle registration certificate (carte grise) issued by the Agence Nationale des Titres Sécurisés (ANTS) or authorized professionals. Contains details about the vehicle, its owner, and technical specificationsRegistration Number, VIN, Vehicle Type, Vehicle Make, First Registration Date, Issue Date, Formula Number, Legal Entity

Document Format (V1)

  • Formats: PDF, JPEG, PNG
  • Pages: Multi-page support
  • Document Type: Text document
  • Maximum Size: 20 MiB

Performance & SLA

  • Target Response Time: 10 seconds
  • Availability: 100%
  • API Standard: REST
  • Output Format: Structured JSON

How It Works

DocProcessor follows a simple workflow to process your documents and extract structured data:

Step 1: Submit Your Document

Upload your document in one of the supported formats:

  • PDF
  • JPEG
  • PNG

The system accepts documents up to 20 MB in size.

Step 2: Automatic Processing

Once submitted, DocProcessor automatically:

  • Reads and interprets the document content
  • Identifies the document type
  • Processes the document in its original language (French, Italian, German, English, or Spanish)
  • Extracts the specified data fields according to the configuration

Step 3: Receive Structured Data

The system returns extracted data as structured JSON, with typed data structures:

  • TEXT: Simple textual values
  • ADDRESS: Structured address with street, zip code, and city
  • NAMES: First names and last names separated
  • DATE: ISO-formatted dates (YYYY-MM-DD)

Key Capabilities

Multilingual Support: DocProcessor automatically detects and processes documents in French, Italian, German, English, or Spanish without requiring language specification.

New Document Types: The system can adapt to new document formats without model retraining, ensuring no downtime when introducing new document types.

Data Types Supported

DocProcessor extracts fields with the following data types:

  • TEXT: Simple textual values (e.g., company name, fiscal number)
  • ADDRESS: Structured address with street, zip code, and city
  • NAMES: First names and last names separated
  • DATE: ISO-formatted dates (YYYY-MM-DD)

Use Cases

Example Use Case: Payslip Data Extraction

A customer needs to extract specific data from employee payslips across multiple countries and formats.

Traditional Approach:

  • Requires approximately 8 weeks to train models for each document variant
  • Limited flexibility for field customization
  • Significant development effort for new formats

DocProcessor Approach:

  • Extract exactly the fields needed (e.g., last name, first name, gross salary)
  • Integration in days instead of weeks
  • Easy modification via prompt-based configuration
  • Support for documents from different countries without retraining

Example Use Case: KYC Onboarding

Financial institutions can automate customer data extraction from:

  • Tax notices for income verification
  • Bank account statements for IBAN validation
  • KBIS documents for company verification
  • Multiple document types in a single workflow

API Integration

Sample JSON Output

{
  "textDocumentInfo": {
    "documentTypeDetail": "PAYSLIP",
    "fields": {
      "EMPLOYER_NAME": {
        "data_type": "TEXT",
        "value": "NETHEOS"
      },
      "NET_SALARY": {
        "data_type": "TEXT",
        "value": "250,76"
      },
      "GROSS_SALARY": {
        "data_type": "TEXT",
        "value": "151,67"
      },
      "EMPLOYER_ADDRESS": {
        "data_type": "ADDRESS",
        "address": "avenue bernard claude Parc Club du Millenaire",
        "zipCode": "34000",
        "city": "MONTPELLIER"
      },
      "EMPLOYEE_NAME": {
        "data_type": "NAMES",
        "firstNames": "JOHN",
        "lastName": "CENA"
      },
      "EMPLOYEE_ADDRESS": {
        "data_type": "ADDRESS",
        "address": "Les impasses de la Mer Appt 34 70 rue de Pivert",
        "zipCode": "34000",
        "city": "MONTPELLIER"
      },
      "PAYSLIP_DATE": {
        "data_type": "DATE",
        "value": "2015-01-01"
      }
    }
  }
}

Constraints & Limitations

Technical Constraints

  • Documents must be text-based (not handwritten)
  • Maximum file size: 20 MB
  • Documents must be in A4 format or a similar standard size
  • Requires reasonable image quality for accurate extraction

Benefits & Competitive Advantages

Speed to Market

  • New document type integration in days vs traditional 8 weeks
  • No model training required for new formats
  • Rapid adaptation to customer-specific requirements

Flexibility & Scalability

  • Customizable field extraction via configuration
  • Prompt-based approach allows easy modifications
  • Extensible architecture for future document types

Quality & Accuracy

  • Powered by state-of-the-art LLM/VLM technology
  • High extraction quality across multiple languages
  • Robust handling of varied document structures

Integration with Namirial OnBoarding

Overview

DocProcessor integrates with Namirial OnBoarding (NOB) to enable automated document processing within customer onboarding workflows.

Current Status: Demo integration - simplified workflow for evaluation purposes.

Current Integration (Demo)

Available Features

The current integration offers a single, pre-configured workflow designed for demonstration and evaluation purposes:

  • Single document upload per request
  • Pre-configured document processing (no customization available)
  • No input parameters required
  • Processing through DocProcessor backend
  • Results available in NOB backoffice

Limitations

  • Configuration options (parameters and settings) are not yet available
  • Document cannot be passed directly via API (upload link only)
  • Single workflow configuration
  • Limited to demonstration scenarios

Integration Methods

1. Request Creation from Backoffice

Setup

The integration uses a Request Type based on a specific model.

Note: The name references the legacy Text Engine system and will be updated to reflect the DocProcessor integration.

Process

Step 1: Create Request

  1. Access the Namirial OnBoarding back office
  2. Select the Request Type that covers
  3. Click "Create"

No parameters need to be configured - the system uses a pre-defined configuration.

Step 2: Upload Document

  1. The system generates a unique link
  2. Share the link with the end user
  3. User accesses the link and uploads a single document via the web interface

Step 3: Processing

  • Document is sent to DocProcessor
  • Automatic processing and data extraction
  • Results available in the NOB back office

2. Request Creation via API

Endpoint

POST https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests

Headers

Authorization: Bearer {YOUR_ACCESS_TOKEN}
Accept: application/json
Content-Type: application/json

Request Body

{
  "requestTypeId": "8870fa7a-2e51-4af4-9724-2ac4230163db",
  "parameters": {},
  "settings": {}
}

Note: The parameters and settings fields are currently empty and not configurable. They are reserved for future enhancements.

cURL Example

curl 'https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests?language=en' \
  -H 'Authorization: Bearer {YOUR_ACCESS_TOKEN}' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  --data-raw '{"requestTypeId":"8870fa7a-2e51-4af4-9724-2ac4230163db","parameters":{},"settings":{}}'

Response

{
  "requestId": "unique-request-id",
  "link": "https://...",
  "status": "created"
}

The response contains:

  • requestId: Unique identifier for tracking
  • link: URL to share with end user for document upload
  • status: Current request status

Important Notes

  • The requestTypeId is specific to each integration and environment
  • Currently, the document cannot be passed in the API call body
  • Users must use the generated link to upload documents

Planned Enhancements

1. Internal Operator Review Step

Enable manual review and validation after automatic processing

Workflow:

  1. Document is processed automatically by DocProcessor

  2. Request enters "Pending Review" status

  3. Internal NOB operator reviews:

    • Original document
    • Extracted data
    • Processing results
  4. Operator can:

    • Approve the request
    • Reject the request
    • Correct extracted data if needed
  5. Request proceeds to next workflow step

Features:

  • Automatic or manual assignment of requests to operators
  • Workload monitoring dashboard
  • Review history and audit trail

2. Document Upload via API

Ability to pass the document directly in the request creation call, eliminating the upload link step.

Benefits:

  • Fully automated workflow (no user interaction required)
  • Direct integration with external systems
  • Faster processing time

3. Configurable Parameters

Enable configuration of document processing parameters per request.

Expected Parameters:

  • documentType: Specify expected document type for optimized processing
  • language: Override language detection
  • extractionFields: Customize which fields to extract
  • validationRules: Apply custom validation logic

Expected Settings:

  • confidenceThreshold: Minimum confidence score for extracted data
  • manualReviewRequired: Force manual review step
  • webhookUrl: URL for asynchronous notifications