DocProcessor is an AI-powered document understanding engine leveraging Large Language Models (LLM) and Vision-Language Models (VLM) to read, interpret, and extract structured data from diverse text-based documents across multiple languages and formats.
- Flexibility: Choose precisely which data to extract via configuration—no model training required
- Speed: Integrate new document types quickly without model retraining, reducing time from weeks to days
- Robustness: High-quality extraction powered by LLM/VLM technology
- Multilingual: Native support for 5 major European languages
- Easy Integration: RESTful API for seamless system integration
DocProcessor supports the following languages for document text:
- French
- English
- German
- Italian
- Spanish
| Document Type | Description | Key Fields Extracted |
|---|---|---|
| Bank Account | Bank account document containing bank and account holder details | IBAN, BIC, Bank Name, Account Owner Name |
| Energy Invoice | Simple energy consumption invoice without payment schedule | Address, Customer Name, Issue Date |
| Energy Schedule | Monthly energy invoice payment schedule | Address, Customer Name, Issue Date |
| Family Allowance | Family allowance document | Name, Address, Beneficiary Number, Payment Date, Benefit Amount, Family Quotient |
| Insurance Attestation | Insurance attestation document containing personal information | Name, Address, Issue Date |
| Payslip | Payslip document containing details about the employer, employee, and payment | Employer Name, Employer Address, Employee Name, Employee Address, Payslip Date, Net Salary, Gross Monthly Income, Monthly Income, Annual Income, Entry Date, Company Office ID, Code NAF, NIR |
| Phone Invoice | Phone invoice document | Name, Address, Issue Date, Phone Number |
| Provider Attestation | Subscription or plan attestation for a residence | Name, Address, Issue Date |
| Retirement Pension | Retirement pension containing personal information with address | Name, Address, Issue Date |
| Tax Notice | Tax notice document containing fiscal and personal information | Tax Year, Fiscal Number 1, Fiscal Number 2, Tax Reference, Address, Date, IBAN, BIC, Taxable Income, Tax Reference Income, Global Gross Income |
| Vehicle Registration Certificate | The French vehicle registration certificate (carte grise) issued by the Agence Nationale des Titres Sécurisés (ANTS) or authorized professionals. Contains details about the vehicle, its owner, and technical specifications | Registration Number, VIN, Vehicle Type, Vehicle Make, First Registration Date, Issue Date, Formula Number, Legal Entity |
- Formats: PDF, JPEG, PNG
- Pages: Multi-page support
- Document Type: Text document
- Maximum Size: 20 MiB
- Target Response Time: 10 seconds
- Availability: 100%
- API Standard: REST
- Output Format: Structured JSON
DocProcessor follows a simple workflow to process your documents and extract structured data:
Upload your document in one of the supported formats:
- JPEG
- PNG
The system accepts documents up to 20 MB in size.
Once submitted, DocProcessor automatically:
- Reads and interprets the document content
- Identifies the document type
- Processes the document in its original language (French, Italian, German, English, or Spanish)
- Extracts the specified data fields according to the configuration
The system returns extracted data as structured JSON, with typed data structures:
- TEXT: Simple textual values
- ADDRESS: Structured address with street, zip code, and city
- NAMES: First names and last names separated
- DATE: ISO-formatted dates (YYYY-MM-DD)
Multilingual Support: DocProcessor automatically detects and processes documents in French, Italian, German, English, or Spanish without requiring language specification.
New Document Types: The system can adapt to new document formats without model retraining, ensuring no downtime when introducing new document types.
DocProcessor extracts fields with the following data types:
- TEXT: Simple textual values (e.g., company name, fiscal number)
- ADDRESS: Structured address with street, zip code, and city
- NAMES: First names and last names separated
- DATE: ISO-formatted dates (YYYY-MM-DD)
A customer needs to extract specific data from employee payslips across multiple countries and formats.
Traditional Approach:
- Requires approximately 8 weeks to train models for each document variant
- Limited flexibility for field customization
- Significant development effort for new formats
DocProcessor Approach:
- Extract exactly the fields needed (e.g., last name, first name, gross salary)
- Integration in days instead of weeks
- Easy modification via prompt-based configuration
- Support for documents from different countries without retraining
Financial institutions can automate customer data extraction from:
- Tax notices for income verification
- Bank account statements for IBAN validation
- KBIS documents for company verification
- Multiple document types in a single workflow
{
"textDocumentInfo": {
"documentTypeDetail": "PAYSLIP",
"fields": {
"EMPLOYER_NAME": {
"data_type": "TEXT",
"value": "NETHEOS"
},
"NET_SALARY": {
"data_type": "TEXT",
"value": "250,76"
},
"GROSS_SALARY": {
"data_type": "TEXT",
"value": "151,67"
},
"EMPLOYER_ADDRESS": {
"data_type": "ADDRESS",
"address": "avenue bernard claude Parc Club du Millenaire",
"zipCode": "34000",
"city": "MONTPELLIER"
},
"EMPLOYEE_NAME": {
"data_type": "NAMES",
"firstNames": "JOHN",
"lastName": "CENA"
},
"EMPLOYEE_ADDRESS": {
"data_type": "ADDRESS",
"address": "Les impasses de la Mer Appt 34 70 rue de Pivert",
"zipCode": "34000",
"city": "MONTPELLIER"
},
"PAYSLIP_DATE": {
"data_type": "DATE",
"value": "2015-01-01"
}
}
}
}- Documents must be text-based (not handwritten)
- Maximum file size: 20 MB
- Documents must be in A4 format or a similar standard size
- Requires reasonable image quality for accurate extraction
- New document type integration in days vs traditional 8 weeks
- No model training required for new formats
- Rapid adaptation to customer-specific requirements
- Customizable field extraction via configuration
- Prompt-based approach allows easy modifications
- Extensible architecture for future document types
- Powered by state-of-the-art LLM/VLM technology
- High extraction quality across multiple languages
- Robust handling of varied document structures
DocProcessor integrates with Namirial OnBoarding (NOB) to enable automated document processing within customer onboarding workflows.
Current Status: Demo integration - simplified workflow for evaluation purposes.
The current integration offers a single, pre-configured workflow designed for demonstration and evaluation purposes:
- Single document upload per request
- Pre-configured document processing (no customization available)
- No input parameters required
- Processing through DocProcessor backend
- Results available in NOB backoffice
- Configuration options (
parametersandsettings) are not yet available - Document cannot be passed directly via API (upload link only)
- Single workflow configuration
- Limited to demonstration scenarios
The integration uses a Request Type based on a specific model.
Note: The name references the legacy Text Engine system and will be updated to reflect the DocProcessor integration.
Step 1: Create Request
- Access the Namirial OnBoarding back office
- Select the Request Type that covers
- Click "Create"
No parameters need to be configured - the system uses a pre-defined configuration.
Step 2: Upload Document
- The system generates a unique link
- Share the link with the end user
- User accesses the link and uploads a single document via the web interface
Step 3: Processing
- Document is sent to DocProcessor
- Automatic processing and data extraction
- Results available in the NOB back office
POST https://test-eu-ie1-api.namirialonboarding.com/api/v2/requestsAuthorization: Bearer {YOUR_ACCESS_TOKEN}
Accept: application/json
Content-Type: application/json{
"requestTypeId": "8870fa7a-2e51-4af4-9724-2ac4230163db",
"parameters": {},
"settings": {}
}Note: The parameters and settings fields are currently empty and not configurable. They are reserved for future enhancements.
curl 'https://test-eu-ie1-api.namirialonboarding.com/api/v2/requests?language=en' \
-H 'Authorization: Bearer {YOUR_ACCESS_TOKEN}' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
--data-raw '{"requestTypeId":"8870fa7a-2e51-4af4-9724-2ac4230163db","parameters":{},"settings":{}}'{
"requestId": "unique-request-id",
"link": "https://...",
"status": "created"
}The response contains:
requestId: Unique identifier for trackinglink: URL to share with end user for document uploadstatus: Current request status
- The
requestTypeIdis specific to each integration and environment - Currently, the document cannot be passed in the API call body
- Users must use the generated link to upload documents
Enable manual review and validation after automatic processing
Workflow:
Document is processed automatically by DocProcessor
Request enters "Pending Review" status
Internal NOB operator reviews:
- Original document
- Extracted data
- Processing results
Operator can:
- Approve the request
- Reject the request
- Correct extracted data if needed
Request proceeds to next workflow step
Features:
- Automatic or manual assignment of requests to operators
- Workload monitoring dashboard
- Review history and audit trail
Ability to pass the document directly in the request creation call, eliminating the upload link step.
Benefits:
- Fully automated workflow (no user interaction required)
- Direct integration with external systems
- Faster processing time
Enable configuration of document processing parameters per request.
Expected Parameters:
documentType: Specify expected document type for optimized processinglanguage: Override language detectionextractionFields: Customize which fields to extractvalidationRules: Apply custom validation logic
Expected Settings:
confidenceThreshold: Minimum confidence score for extracted datamanualReviewRequired: Force manual review stepwebhookUrl: URL for asynchronous notifications