DocProcessor Summary
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

DocProcessor is an AI-powered document understanding engine leveraging Large Language Models (LLM) and Vision-Language Models (VLM) to read, interpret, and extract structured data from diverse text-based documents across multiple languages and formats.

‌

Key Value Propositions

Flexibility: Choose precisely which data to extract via configuration—no model training required
Speed: Integrate new document types quickly without model retraining, reducing time from weeks to days
Robustness: High-quality extraction powered by LLM/VLM technology
Multilingual: Native support for 5 major European languages
Easy Integration: RESTful API for seamless system integration

Technical Specifications

Supported Languages (V1)

DocProcessor supports the following languages for document text:

French
English
German
Italian
Spanish

Supported Document Types (V1)

Document Type	Description	Key Fields Extracted
Bank Account	Bank account document containing bank and account holder details	IBAN, BIC, Bank Name, Account Owner Name
Energy Invoice	Simple energy consumption invoice without payment schedule	Address, Customer Name, Issue Date
Energy Schedule	Monthly energy invoice payment schedule	Address, Customer Name, Issue Date
Family Allowance	Family allowance document	Name, Address, Beneficiary Number, Payment Date, Benefit Amount, Family Quotient
Insurance Attestation	Insurance attestation document containing personal information	Name, Address, Issue Date
Payslip	Payslip document containing details about the employer, employee, and payment	Employer Name, Employer Address, Employee Name, Employee Address, Payslip Date, Net Salary, Gross Monthly Income, Monthly Income, Annual Income, Entry Date, Company Office ID, Code NAF, NIR
Phone Invoice	Phone invoice document	Name, Address, Issue Date, Phone Number
Provider Attestation	Subscription or plan attestation for a residence	Name, Address, Issue Date
Retirement Pension	Retirement pension containing personal information with address	Name, Address, Issue Date
Tax Notice	Tax notice document containing fiscal and personal information	Tax Year, Fiscal Number 1, Fiscal Number 2, Tax Reference, Address, Date, IBAN, BIC, Taxable Income, Tax Reference Income, Global Gross Income
Vehicle Registration Certificate	The French vehicle registration certificate (carte grise) issued by the Agence Nationale des Titres Sécurisés (ANTS) or authorized professionals. Contains details about the vehicle, its owner, and technical specifications	Registration Number, VIN, Vehicle Type, Vehicle Make, First Registration Date, Issue Date, Formula Number, Legal Entity

Document Format (V1)

Formats: PDF, JPEG, PNG
Pages: Multi-page support
Document Type: Text document
Maximum Size: 20 MiB

Performance & SLA

Target Response Time: 10 seconds
Availability: 100%
API Standard: REST
Output Format: Structured JSON

How It Works

DocProcessor follows a simple workflow to process your documents and extract structured data:

Step 1: Submit Your Document

Upload your document in one of the supported formats:

PDF
JPEG
PNG

The system accepts documents up to 20 MB in size.

Step 2: Automatic Processing

Once submitted, DocProcessor automatically:

Reads and interprets the document content
Identifies the document type
Processes the document in its original language (French, Italian, German, English, or Spanish)
Extracts the specified data fields according to the configuration

Step 3: Receive Structured Data

The system returns extracted data as structured JSON, with typed data structures:

TEXT: Simple textual values
ADDRESS: Structured address with street, zip code, and city
NAMES: First names and last names separated
DATE: ISO-formatted dates (YYYY-MM-DD)

Key Capabilities

Multilingual Support: DocProcessor automatically detects and processes documents in French, Italian, German, English, or Spanish without requiring language specification.

New Document Types: The system can adapt to new document formats without model retraining, ensuring no downtime when introducing new document types.

Data Types Supported

DocProcessor extracts fields with the following data types:

TEXT: Simple textual values (e.g., company name, fiscal number)
ADDRESS: Structured address with street, zip code, and city
NAMES: First names and last names separated
DATE: ISO-formatted dates (YYYY-MM-DD)

Use Cases

Example Use Case: Payslip Data Extraction

A customer needs to extract specific data from employee payslips across multiple countries and formats.

Traditional Approach:

Requires approximately 8 weeks to train models for each document variant
Limited flexibility for field customization
Significant development effort for new formats

DocProcessor Approach:

Extract exactly the fields needed (e.g., last name, first name, gross salary)
Integration in days instead of weeks
Easy modification via prompt-based configuration
Support for documents from different countries without retraining

Example Use Case: KYC Onboarding

Financial institutions can automate customer data extraction from:

Tax notices for income verification
Bank account statements for IBAN validation
KBIS documents for company verification
Multiple document types in a single workflow

API Integration

Sample JSON Output

{
  "textDocumentInfo": {
    "documentTypeDetail": "PAYSLIP",
    "fields": {
      "EMPLOYER_NAME": {
        "data_type": "TEXT",
        "value": "NETHEOS"
      },
      "NET_SALARY": {
        "data_type": "TEXT",
        "value": "250,76"
      },
      "GROSS_SALARY": {
        "data_type": "TEXT",
        "value": "151,67"
      },
      "EMPLOYER_ADDRESS": {
        "data_type": "ADDRESS",
        "address": "avenue bernard claude Parc Club du Millenaire",
        "zipCode": "34000",
        "city": "MONTPELLIER"
      },
      "EMPLOYEE_NAME": {
        "data_type": "NAMES",
        "firstNames": "JOHN",
        "lastName": "CENA"
      },
      "EMPLOYEE_ADDRESS": {
        "data_type": "ADDRESS",
        "address": "Les impasses de la Mer Appt 34 70 rue de Pivert",
        "zipCode": "34000",
        "city": "MONTPELLIER"
      },
      "PAYSLIP_DATE": {
        "data_type": "DATE",
        "value": "2015-01-01"
      }
    }
  }
}

Constraints & Limitations

Technical Constraints

Documents must be text-based (not handwritten)
Maximum file size: 20 MB
Documents must be in A4 format or a similar standard size
Requires reasonable image quality for accurate extraction

Benefits & Competitive Advantages

Speed to Market

New document type integration in days vs traditional 8 weeks
No model training required for new formats
Rapid adaptation to customer-specific requirements

Flexibility & Scalability

Customizable field extraction via configuration
Prompt-based approach allows easy modifications
Extensible architecture for future document types

Quality & Accuracy

Powered by state-of-the-art LLM/VLM technology
High extraction quality across multiple languages
Robust handling of varied document structures

Integration with Namirial OnBoarding

Overview

DocProcessor integrates with Namirial OnBoarding (NOB) to enable automated document processing within customer onboarding workflows.

Current Status: Demo integration - simplified workflow for evaluation purposes.

Current Integration (Demo)

Available Features

The current integration offers a single, pre-configured workflow designed for demonstration and evaluation purposes:

Single document upload per request
Pre-configured document processing (no customization available)
No input parameters required
Processing through DocProcessor backend
Results available in NOB backoffice

Limitations

Configuration options (parameters and settings) are not yet available
Document cannot be passed directly via API (upload link only)
Single workflow configuration
Limited to demonstration scenarios

Integration Methods

1. Request Creation from Backoffice

Setup

The integration uses a Request Type based on a specific model.

Note: The name references the legacy Text Engine system and will be updated to reflect the DocProcessor integration.

Process

Step 1: Create Request

Access the Namirial OnBoarding back office
Select the Request Type that covers
Click "Create"

No parameters need to be configured - the system uses a pre-defined configuration.

Step 2: Upload Document

The system generates a unique link
Share the link with the end user
User accesses the link and uploads a single document via the web interface

Step 3: Processing

Document is sent to DocProcessor
Automatic processing and data extraction
Results available in the NOB back office

2. Request Creation via API

Endpoint

POST https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests

Headers

Authorization: Bearer {YOUR_ACCESS_TOKEN}
Accept: application/json
Content-Type: application/json

Request Body

{
  "requestTypeId": "8870fa7a-2e51-4af4-9724-2ac4230163db",
  "parameters": {},
  "settings": {}
}

Note: The parameters and settings fields are currently empty and not configurable. They are reserved for future enhancements.

cURL Example

curl 'https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests?language=en' \
  -H 'Authorization: Bearer {YOUR_ACCESS_TOKEN}' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  --data-raw '{"requestTypeId":"8870fa7a-2e51-4af4-9724-2ac4230163db","parameters":{},"settings":{}}'

Response

{
  "requestId": "unique-request-id",
  "link": "https://...",
  "status": "created"
}

The response contains:

requestId: Unique identifier for tracking
link: URL to share with end user for document upload
status: Current request status

Important Notes

The requestTypeId is specific to each integration and environment
Currently, the document cannot be passed in the API call body
Users must use the generated link to upload documents

Planned Enhancements

1. Internal Operator Review Step

Enable manual review and validation after automatic processing

Workflow:

Document is processed automatically by DocProcessor
Request enters "Pending Review" status
Internal NOB operator reviews:
- Original document
- Extracted data
- Processing results
Operator can:
- Approve the request
- Reject the request
- Correct extracted data if needed
Request proceeds to next workflow step

Features:

Automatic or manual assignment of requests to operators
Workload monitoring dashboard
Review history and audit trail

2. Document Upload via API

Ability to pass the document directly in the request creation call, eliminating the upload link step.

Benefits:

Fully automated workflow (no user interaction required)
Direct integration with external systems
Faster processing time

3. Configurable Parameters

Enable configuration of document processing parameters per request.

Expected Parameters:

documentType: Specify expected document type for optimized processing
language: Override language detection
extractionFields: Customize which fields to extract
validationRules: Apply custom validation logic

Expected Settings:

confidenceThreshold: Minimum confidence score for extracted data
manualReviewRequired: Force manual review step
webhookUrl: URL for asynchronous notifications

DocProcessor SummaryCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude