Azure Document Intelligence

AI & Machine Learning automation

v1.0.0

Last updated Oct 9, 2025

n8n community node for Azure Document Intelligence (Form Recognizer)

15 Weekly Downloads

59 Monthly Downloads

View on NPM GitHub Repository

Included Nodes

Azure Document Intelligence

Description

n8n-nodes-azure-document-intelligence

This is an n8n community node that integrates Azure Document Intelligence (formerly Form Recognizer) into your n8n workflows.

Azure Document Intelligence is a cloud-based service that uses machine learning models to extract text, key-value pairs, tables, and structures from documents. Perfect for automated document processing, form recognition, invoice extraction, and OCR tasks.

n8n is a fair-code licensed workflow automation platform.

Disclaimer

This is an unofficial community node and is not affiliated with, endorsed by, or supported by Microsoft Corporation or n8n GmbH.

Azure, Azure Document Intelligence, Form Recognizer, and related trademarks are property of Microsoft Corporation. Users must comply with Microsoft's Azure AI Services terms and conditions.

This package is provided "as is" under the MIT License without warranty of any kind.

Installation
Features
Credentials
Usage
Supported Models
Parameters
Multiple Outputs
Examples
Resources
Version History

Installation

Follow the installation guide in the n8n community nodes documentation.

npm

npm install n8n-nodes-azure-document-intelligence

Manual Installation (Development)

# Clone this repository
git clone https://github.com/mlangcode/n8n-nodes-azure-document-intelligence.git
cd n8n-nodes-azure-document-intelligence

# Install dependencies and build
npm install
npm run build

# Link to your local n8n
npm link
cd ~/.n8n
npm link n8n-nodes-azure-document-intelligence

# Restart n8n

Features

✅ Multiple Prebuilt Models: Support for 9 prebuilt models (read, layout, invoice, receipt, ID, business card, etc.)
✅ Flexible Input: Binary data, URL, or base64-encoded content
✅ Three Outputs: Separate outputs for content, structured data, and tables
✅ Markdown Support: Extract documents in markdown or plain text format
✅ Table Processing: Automatically identifies headers and converts tables to structured data
✅ Page Selection: Analyze specific pages from multi-page documents
✅ Locale Support: Specify language hints for better recognition
✅ Long-Running Operations: Automatic polling for document analysis completion
✅ Error Handling: Comprehensive error messages and validation
✅ Binary Data Support: Seamlessly integrate with n8n's binary data field

Credentials

This node uses Azure Document Intelligence credentials with the following fields:

Endpoint: Your Azure Document Intelligence endpoint URL (e.g., https://your-resource.cognitiveservices.azure.com)
API Key: Your Azure Document Intelligence subscription key
API Version: The API version to use (default: 2024-11-30)

Setting Up Credentials

In n8n, go to Credentials → New
Search for "Azure Document Intelligence"
Fill in your endpoint URL and API key
Click Save

Usage

Basic Usage

Add the "Azure Document Intelligence" node to your workflow
Configure your Azure Document Intelligence credentials
Select the appropriate prebuilt model for your document type
Choose input source (binary data, URL, or base64)
Configure additional options as needed

The node subtitle will display the selected model for easy identification.

Supported Models

The node supports the following prebuilt models:

Text Extraction

Read (OCR): Basic optical character recognition for extracting printed and handwritten text
Layout: Extract text, tables, selection marks, and document structure

General Documents

General Document: Extract key-value pairs, entities, and general structure from any document type

Specialized Forms

Invoice: Extract vendor name, invoice date, total, line items, and other invoice fields
Receipt: Extract merchant name, transaction date, total, and line items from receipts
ID Document: Extract information from passports, driver's licenses, and identity cards
Business Card: Extract contact information including names, companies, emails, and phone numbers

US-Specific Forms

Health Insurance Card (US): Extract member information, group numbers, and insurance details
W-2 Tax Form (US): Extract employer information, wages, and tax withholding data

Parameters

Required Parameters

Model: Select the prebuilt model appropriate for your document type
Input Source: Choose how to provide the document:
- Binary Data: Use document from a previous node's binary field
- URL: Provide a public URL to the document
- Base64: Provide base64-encoded document content

Input Source Specific

Binary Data

Binary Property: Name of the binary property (default: data)

URL

Document URL: Public URL to the document

Base64

Base64 Content: Base64-encoded string of the document

Additional Options

Content Type: Specify the document MIME type (PDF, JPEG, PNG, TIFF, BMP, HEIF)
Output Content Format: Choose between text or markdown for extracted content (for read/layout models)
Pages: Specify which pages to analyze (e.g., 1-3,5 or 1,3,5-7)
Locale: Language hint for text recognition (e.g., en-US, de-DE, fr-FR)

Multiple Outputs

The node has three outputs for flexible workflow routing:

Output 0: Content 📄

Contains: Raw text or markdown content extracted from the document

{
  "content": "# Invoice\n\nVendor: Acme Corp...",
  "contentLength": 1234,
  "model": "prebuilt-layout"
}

Use this for:

Text extraction and OCR workflows
Full document content for further processing
Feeding to LLMs or text analysis nodes

Output 1: Structured Data 📊

Contains: Extracted fields, key-value pairs, and structured information

{
  "model": "prebuilt-invoice",
  "pageCount": 2,
  "documents": [
    {
      "docType": "invoice",
      "fields": {
        "VendorName": { "content": "Acme Corp", "confidence": 0.99 },
        "InvoiceTotal": { "content": "1,234.56", "confidence": 0.98 },
        "InvoiceDate": { "content": "2024-01-15", "confidence": 0.97 }
      }
    }
  ],
  "pages": [...]
}

Use this for:

Extracting specific fields (invoice data, receipt information)
Key-value pair extraction
Document field validation and processing

Output 2: Tables 📋

Contains: Processed tables with identified headers and structured row data

{
  "tableCount": 2,
  "tables": [
    {
      "headers": ["Item", "Quantity", "Price", "Total"],
      "dataRows": [
        { "Item": "Widget A", "Quantity": "5", "Price": "$10.00", "Total": "$50.00" },
        { "Item": "Widget B", "Quantity": "3", "Price": "$15.00", "Total": "$45.00" }
      ]
    }
  ],
  "model": "prebuilt-layout"
}

Use this for:

Extracting tabular data from documents
Processing invoice line items
Converting document tables to structured data for databases

Error Handling

When errors occur (and "Continue on Fail" is enabled):

Error details are sent to all three outputs
Includes HTTP status codes and error messages
Workflow continues instead of stopping

Examples

Example 1: Extract Text from PDF

Workflow:

HTTP Request (download PDF)
  → Azure Document Intelligence
      Model: Read (OCR)
      Input Source: Binary Data
      Binary Property: data
  → [Content Output] → Process extracted text

Example 2: Extract Invoice Fields

Workflow:

HTTP Request (get invoice PDF)
  → Azure Document Intelligence
      Model: Invoice
      Input Source: Binary Data
  → [Structured Data Output]
      → Code Node: Extract $.documents[0].fields
      → Store in database

Extracted Fields:

VendorName
CustomerName
InvoiceDate
InvoiceTotal
DueDate
Line items

Example 3: Process Tables from Documents

Workflow:

Read Binary File (read document)
  → Azure Document Intelligence
      Model: Layout
      Input Source: Binary Data
      Output Format: Markdown
  → [Tables Output]
      → Code Node: Process table rows
      → Send to Google Sheets

Example 4: OCR from URL

Workflow:

Azure Document Intelligence
  Model: Read (OCR)
  Input Source: URL
  Document URL: https://example.com/document.pdf
  → [Content Output]
      → Send extracted text to analysis

Example 5: Extract Business Card Info

Workflow:

Webhook (receive uploaded image)
  → Azure Document Intelligence
      Model: Business Card
      Input Source: Binary Data
  → [Structured Data Output]
      → Extract contact fields:
          - Name
          - Company
          - Email
          - Phone
      → Add to CRM

Example 6: Multi-Page Document with Page Selection

Workflow:

Azure Document Intelligence
  Model: Layout
  Input Source: Binary Data
  Pages: 1-5,10
  Output Format: Markdown
  → Process only specified pages

Example 7: Receipt Processing

Workflow:

Email Trigger (receipt attachments)
  → Azure Document Intelligence
      Model: Receipt
      Input Source: Binary Data
  → [Structured Data Output]
      → Extract:
          - MerchantName
          - TransactionDate
          - Total
          - Items
      → Log to expense tracking system

Resources

Compatibility

Requires n8n version 1.60.0 or later
Compatible with Azure Document Intelligence API version 2024-11-30 (GA)
Supports all Azure Document Intelligence prebuilt models

Supported Document Types

PDF (application/pdf)
JPEG (image/jpeg)
PNG (image/png)
TIFF (image/tiff)
BMP (image/bmp)
HEIF (image/heif)

Troubleshooting

"Authentication failed" error

Verify your API key is correct
Ensure the endpoint URL is correct and includes https://

"Model not found" error

Check that the model name is spelled correctly
Verify your Azure region supports the selected prebuilt model

"No binary data found" error

Ensure the previous node outputs binary data
Verify the binary property name matches (default: "data")

"analyzeResult is missing" error

The document may be corrupted or in an unsupported format
Try converting the document to PDF first

Long processing times

Document analysis can take 10-60 seconds depending on document size
Multi-page documents take longer to process
The node automatically polls until completion

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Version History

1.0.0

Initial release with Azure Document Intelligence support
Support for 9 prebuilt models (read, layout, document, invoice, receipt, ID, business card, health insurance, W-2)
Three outputs: Content, Structured Data, and Tables
Flexible input methods: Binary data, URL, and base64
Automatic table processing with header identification
Markdown and text output formats
Page selection and locale support
Long-running operation polling
Comprehensive error handling

Author

mlangcode

Support

For issues, questions, or contributions, please visit the GitHub repository.

Included Nodes

Description

n8n-nodes-azure-document-intelligence

Disclaimer

Table of Contents

Installation

npm

Manual Installation (Development)

Features

Credentials

Setting Up Credentials

Usage

Basic Usage

Supported Models

Text Extraction

General Documents

Specialized Forms

US-Specific Forms

Parameters

Required Parameters

Input Source Specific

Binary Data

URL

Base64

Additional Options

Multiple Outputs

Output 0: Content 📄

Output 1: Structured Data 📊

Output 2: Tables 📋

Error Handling

Examples

Example 1: Extract Text from PDF

Example 2: Extract Invoice Fields

Example 3: Process Tables from Documents

Example 4: OCR from URL

Example 5: Extract Business Card Info

Example 6: Multi-Page Document with Page Selection

Example 7: Receipt Processing

Resources

Compatibility

Supported Document Types

Troubleshooting

"Authentication failed" error

"Model not found" error

"No binary data found" error

"analyzeResult is missing" error

Long processing times

Contributing

License

Version History

1.0.0

Author

Support

More in AI & Machine Learning

Qdrant

Tesseract

TOON Encode

AI Scraper

Postgres Vector Store Tool

Aparavi DTC

PDF Vector

Demeterics Chat