## DocProcessor Summary

DocProcessor is an AI-powered document understanding engine leveraging **Large Language Models (LLM)** and **Vision-Language Models (VLM)** to read, interpret, and extract structured data from diverse text-based documents across multiple languages and formats.

‌

### Key Value Propositions

* **Flexibility**: Choose precisely which data to extract via configuration—no model training required
* **Speed**: Integrate new document types quickly without model retraining, reducing time from weeks to days
* **Robustness**: High-quality extraction powered by LLM/VLM technology
* **Multilingual**: Native support for 5 major European languages
* **Easy Integration**: RESTful API for seamless system integration


## Technical Specifications

### Supported Languages (V1)

DocProcessor supports the following languages for document text:

* French
* English
* German
* Italian
* Spanish


### Supported Document Types (V1)

| Document Type | Description | Key Fields Extracted |
|  --- | --- | --- |
| Bank Account | Bank account document containing bank and account holder details | IBAN, BIC, Bank Name, Account Owner Name |
| Energy Invoice | Simple energy consumption invoice without payment schedule | Address, Customer Name, Issue Date |
| Energy Schedule | Monthly energy invoice payment schedule | Address, Customer Name, Issue Date |
| Family Allowance | Family allowance document | Name, Address, Beneficiary Number, Payment Date, Benefit Amount, Family Quotient |
| Insurance Attestation | Insurance attestation document containing personal information | Name, Address, Issue Date |
| Payslip | Payslip document containing details about the employer, employee, and payment | Employer Name, Employer Address, Employee Name, Employee Address, Payslip Date, Net Salary, Gross Monthly Income, Monthly Income, Annual Income, Entry Date, Company Office ID, Code NAF, NIR |
| Phone Invoice | Phone invoice document | Name, Address, Issue Date, Phone Number |
| Provider Attestation | Subscription or plan attestation for a residence | Name, Address, Issue Date |
| Retirement Pension | Retirement pension containing personal information with address | Name, Address, Issue Date |
| Tax Notice | Tax notice document containing fiscal and personal information | Tax Year, Fiscal Number 1, Fiscal Number 2, Tax Reference, Address, Date, IBAN, BIC, Taxable Income, Tax Reference Income, Global Gross Income |
| Vehicle Registration Certificate | The French vehicle registration certificate (carte grise) issued by the Agence Nationale des Titres Sécurisés (ANTS) or authorized professionals. Contains details about the vehicle, its owner, and technical specifications | Registration Number, VIN, Vehicle Type, Vehicle Make, First Registration Date, Issue Date, Formula Number, Legal Entity |


### Document Format (V1)

* **Formats**: PDF, JPEG, PNG
* **Pages**: Multi-page support
* **Document Type**: Text document
* **Maximum Size**: 20 MiB


### Performance & SLA

* **Target Response Time**: 10 seconds
* **Availability**: 100%
* **API Standard**: REST
* **Output Format**: Structured JSON


## How It Works

DocProcessor follows a simple workflow to process your documents and extract structured data:

### Step 1: Submit Your Document

Upload your document in one of the supported formats:

* PDF
* JPEG
* PNG


The system accepts documents up to 20 MB in size.

### Step 2: Automatic Processing

Once submitted, DocProcessor automatically:

* Reads and interprets the document content
* Identifies the document type
* Processes the document in its original language (French, Italian, German, English, or Spanish)
* Extracts the specified data fields according to the configuration


### Step 3: Receive Structured Data

The system returns extracted data as structured JSON, with typed data structures:

* **TEXT**: Simple textual values
* **ADDRESS**: Structured address with street, zip code, and city
* **NAMES**: First names and last names separated
* **DATE**: ISO-formatted dates (YYYY-MM-DD)


### Key Capabilities

**Multilingual Support**: DocProcessor automatically detects and processes documents in French, Italian, German, English, or Spanish without requiring language specification.

**New Document Types**: The system can adapt to new document formats without model retraining, ensuring no downtime when introducing new document types.

### Data Types Supported

DocProcessor extracts fields with the following data types:

* **TEXT**: Simple textual values (e.g., company name, fiscal number)
* **ADDRESS**: Structured address with street, zip code, and city
* **NAMES**: First names and last names separated
* **DATE**: ISO-formatted dates (YYYY-MM-DD)


## Use Cases

### Example Use Case: Payslip Data Extraction

A customer needs to extract specific data from employee payslips across multiple countries and formats.

**Traditional Approach**:

* Requires approximately 8 weeks to train models for each document variant
* Limited flexibility for field customization
* Significant development effort for new formats


**DocProcessor Approach**:

* Extract exactly the fields needed (e.g., last name, first name, gross salary)
* Integration in days instead of weeks
* Easy modification via prompt-based configuration
* Support for documents from different countries without retraining


### Example Use Case: KYC Onboarding

Financial institutions can automate customer data extraction from:

* Tax notices for income verification
* Bank account statements for IBAN validation
* KBIS documents for company verification
* Multiple document types in a single workflow


## API Integration

### Sample JSON Output


```json
{
  "textDocumentInfo": {
    "documentTypeDetail": "PAYSLIP",
    "fields": {
      "EMPLOYER_NAME": {
        "data_type": "TEXT",
        "value": "NETHEOS"
      },
      "NET_SALARY": {
        "data_type": "TEXT",
        "value": "250,76"
      },
      "GROSS_SALARY": {
        "data_type": "TEXT",
        "value": "151,67"
      },
      "EMPLOYER_ADDRESS": {
        "data_type": "ADDRESS",
        "address": "avenue bernard claude Parc Club du Millenaire",
        "zipCode": "34000",
        "city": "MONTPELLIER"
      },
      "EMPLOYEE_NAME": {
        "data_type": "NAMES",
        "firstNames": "JOHN",
        "lastName": "CENA"
      },
      "EMPLOYEE_ADDRESS": {
        "data_type": "ADDRESS",
        "address": "Les impasses de la Mer Appt 34 70 rue de Pivert",
        "zipCode": "34000",
        "city": "MONTPELLIER"
      },
      "PAYSLIP_DATE": {
        "data_type": "DATE",
        "value": "2015-01-01"
      }
    }
  }
}
```

## Constraints & Limitations

### Technical Constraints

* Documents must be text-based (not handwritten)
* Maximum file size: 20 MB
* Documents must be in A4 format or a similar standard size
* Requires reasonable image quality for accurate extraction


## Benefits & Competitive Advantages

### Speed to Market

* New document type integration in days vs traditional 8 weeks
* No model training required for new formats
* Rapid adaptation to customer-specific requirements


### Flexibility & Scalability

* Customizable field extraction via configuration
* Prompt-based approach allows easy modifications
* Extensible architecture for future document types


### Quality & Accuracy

* Powered by state-of-the-art LLM/VLM technology
* High extraction quality across multiple languages
* Robust handling of varied document structures


# Integration with Namirial OnBoarding

## Overview

DocProcessor integrates with Namirial OnBoarding (NOB) to enable automated document processing within customer onboarding workflows.

**Current Status**: Demo integration - simplified workflow for evaluation purposes.

## Current Integration (Demo)

### Available Features

The current integration offers a single, pre-configured workflow designed for demonstration and evaluation purposes:

* Single document upload per request
* Pre-configured document processing (no customization available)
* No input parameters required
* Processing through DocProcessor backend
* Results available in NOB backoffice


### Limitations

* Configuration options (`parameters` and `settings`) are not yet available
* Document cannot be passed directly via API (upload link only)
* Single workflow configuration
* Limited to demonstration scenarios


## Integration Methods

### 1. Request Creation from Backoffice

#### Setup

The integration uses a Request Type based on a specific model.

**Note**: The name references the legacy Text Engine system and will be updated to reflect the DocProcessor integration.

#### Process

**Step 1: Create Request**

1. Access the Namirial OnBoarding back office
2. Select the Request Type that covers
3. Click "Create"


No parameters need to be configured - the system uses a pre-defined configuration.

**Step 2: Upload Document**

1. The system generates a unique link
2. Share the link with the end user
3. User accesses the link and uploads a single document via the web interface


**Step 3: Processing**

* Document is sent to DocProcessor
* Automatic processing and data extraction
* Results available in the NOB back office


### 2. Request Creation via API

#### Endpoint


```
POST https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests
```

#### Headers


```
Authorization: Bearer {YOUR_ACCESS_TOKEN}
Accept: application/json
Content-Type: application/json
```

#### Request Body


```json
{
  "requestTypeId": "8870fa7a-2e51-4af4-9724-2ac4230163db",
  "parameters": {},
  "settings": {}
}
```

**Note**: The `parameters` and `settings` fields are currently empty and not configurable. They are reserved for future enhancements.

#### cURL Example


```shell
curl 'https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests?language=en' \
  -H 'Authorization: Bearer {YOUR_ACCESS_TOKEN}' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  --data-raw '{"requestTypeId":"8870fa7a-2e51-4af4-9724-2ac4230163db","parameters":{},"settings":{}}'
```

#### Response


```json
{
  "requestId": "unique-request-id",
  "link": "https://...",
  "status": "created"
}
```

The response contains:

* `requestId`: Unique identifier for tracking
* `link`: URL to share with end user for document upload
* `status`: Current request status


#### Important Notes

* The `requestTypeId` is specific to each integration and environment
* Currently, the document cannot be passed in the API call body
* Users must use the generated link to upload documents


## Planned Enhancements

### 1. Internal Operator Review Step

Enable manual review and validation after automatic processing

**Workflow**:

1. Document is processed automatically by DocProcessor
2. Request enters "Pending Review" status
3. Internal NOB operator reviews:
  * Original document
  * Extracted data
  * Processing results
4. Operator can:
  * Approve the request
  * Reject the request
  * Correct extracted data if needed
5. Request proceeds to next workflow step


**Features**:

* Automatic or manual assignment of requests to operators
* Workload monitoring dashboard
* Review history and audit trail


### 2. Document Upload via API

Ability to pass the document directly in the request creation call, eliminating the upload link step.

**Benefits**:

* Fully automated workflow (no user interaction required)
* Direct integration with external systems
* Faster processing time


### 3. Configurable Parameters

Enable configuration of document processing parameters per request.

**Expected Parameters**:

* `documentType`: Specify expected document type for optimized processing
* `language`: Override language detection
* `extractionFields`: Customize which fields to extract
* `validationRules`: Apply custom validation logic


**Expected Settings**:

* `confidenceThreshold`: Minimum confidence score for extracted data
* `manualReviewRequired`: Force manual review step
* `webhookUrl`: URL for asynchronous notifications