Document Nodes

Process PDFs and documents with text extraction, OCR, splitting, merging, and conversion.

Required Plugins

Available Plugins

Plugin	Package	Operations	Requirements
documentPlugin	`@uploadista/flow-documents-plugin`	Split, merge, metadata, text extraction	None
pdfLibDocumentPlugin	`@uploadista/flow-documents-pdflib`	Split, merge, metadata only	None
unpdfDocumentPlugin	`@uploadista/flow-documents-unpdf`	Text extraction only	None
documentAiPlugin	`@uploadista/flow-documents-replicate`	OCR, AI extraction	Replicate API token

Installation

npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-plugin

pnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-plugin

yarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-plugin

import { createUploadistaServer } from "@uploadista/server";
import { documentPlugin } from "@uploadista/flow-documents-plugin";

const uploadista = await createUploadistaServer({
  // ...
  plugins: [
    documentPlugin,
  ],
});

Features: Split PDF, merge PDFs, extract metadata, text extraction

Requirements: None (pure JavaScript)

Note: This is the recommended plugin - it combines PDFLib and Unpdf functionality.

npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-pdflib

pnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-pdflib

yarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-pdflib

import { createUploadistaServer } from "@uploadista/server";
import { pdfLibDocumentPlugin } from "@uploadista/flow-documents-pdflib";

const uploadista = await createUploadistaServer({
  // ...
  plugins: [
    pdfLibDocumentPlugin,
  ],
});

Features: Split PDF, merge PDFs, extract metadata

Requirements: None (pure JavaScript)

npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-unpdf

pnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-unpdf

yarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-unpdf

import { createUploadistaServer } from "@uploadista/server";
import { unpdfDocumentPlugin } from "@uploadista/flow-documents-unpdf";

const uploadista = await createUploadistaServer({
  // ...
  plugins: [
    unpdfDocumentPlugin,
  ],
});

Features: Fast text extraction from searchable PDFs

Requirements: None

Note: Only works with searchable PDFs. For scanned documents, use OCR.

npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-replicate

pnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-replicate

yarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-replicate

import { createUploadistaServer } from "@uploadista/server";
import { documentAiPlugin } from "@uploadista/flow-documents-replicate";

const uploadista = await createUploadistaServer({
  // ...
  plugins: [
    documentAiPlugin({ apiToken: process.env.REPLICATE_API_TOKEN }),
  ],
});

Features: OCR, AI text extraction, markdown conversion

Requirements: Replicate API token

Cost: ~$0.005 per document

Plugin Comparison

Feature	Combined	PDFLib	Unpdf	Replicate
Split PDF	Yes	Yes	No	No
Merge PDFs	Yes	Yes	No	No
Extract metadata	Yes	Yes	No	No
Extract text (searchable)	Yes	No	Yes	Yes
OCR (scanned)	No	No	No	Yes
Convert to markdown	No	No	No	Yes
Cost	Free	Free	Free	Per-request

OCR Node

AI-powered text extraction from scanned documents.

Package: @uploadista/flow-documents-nodes

import { createOcrNode } from "@uploadista/flow-documents-nodes";

// Convert scanned document to markdown
const ocrNode = yield* createOcrNode("ocr-1", {
  taskType: "convertToMarkdown",
  resolution: "gundam",
  credentialId: "my-replicate-credential",
});

// Free-form OCR for plain text extraction
const freeOcrNode = yield* createOcrNode("ocr-2", {
  taskType: "freeOcr",
  resolution: "base",
});

// Locate specific content in document
const locateNode = yield* createOcrNode("ocr-3", {
  taskType: "locateObject",
  referenceText: "Invoice Total",
  resolution: "small",
});

Parameters

Parameter	Type	Required	Default	Description
`taskType`	`"convertToMarkdown" \| "freeOcr" \| "parseFigure" \| "locateObject"`	Yes	-	OCR task type
`resolution`	`"tiny" \| "small" \| "base" \| "gundam" \| "large"`	No	-	Model resolution (higher = better quality)
`credentialId`	`string`	No	-	AI service credential ID
`referenceText`	`string`	No	-	Text to locate (for `locateObject` task)
`keepOutput`	`boolean`	No	`false`	Keep output in flow results

Task Types

Task Type	Description
`convertToMarkdown`	Structured markdown output with headings, lists
`freeOcr`	Unstructured plain text extraction
`parseFigure`	Analyze charts and diagrams
`locateObject`	Find specific content using reference text

Performance: ~5-15s | Cost: ~$0.005 per document

Extract Text Node

Fast text extraction from searchable PDFs.

Package: @uploadista/flow-documents-nodes

import { createExtractTextNode } from "@uploadista/flow-documents-nodes";

// Extract text from searchable PDF
const extractNode = yield* createExtractTextNode("extract-1");

// With keepOutput option
const keepOutputNode = yield* createExtractTextNode("extract-2", {
  keepOutput: true,
});

Parameters

Parameter	Type	Required	Default	Description
`keepOutput`	`boolean`	No	`false`	Keep output in flow results

Output: Adds extractedText to file metadata.

Note: For scanned PDFs without selectable text, use OCR Node instead.

Performance: Less than 1s | Cost: Free

Split PDF Node

Split PDFs by page range or into individual pages.

Package: @uploadista/flow-documents-nodes

import { createSplitPdfNode } from "@uploadista/flow-documents-nodes";

// Extract pages 3-5 as single PDF
const rangeNode = yield* createSplitPdfNode("split-1", {
  mode: "range",
  startPage: 3,
  endPage: 5,
});

// Split each page into separate PDF
const individualNode = yield* createSplitPdfNode("split-2", {
  mode: "individual",
});

// With custom naming
const namedNode = yield* createSplitPdfNode("split-3", {
  mode: "range",
  startPage: 1,
  endPage: 10,
  naming: { mode: "auto" },
});

Parameters

Parameter	Type	Required	Default	Description
`mode`	`"range" \| "individual"`	Yes	-	Split mode
`startPage`	`number`	No	-	Start page (for range mode)
`endPage`	`number`	No	-	End page (for range mode)
`keepOutput`	`boolean`	No	`false`	Keep output in flow results
`naming`	`FileNamingConfig`	No	-	File naming configuration

Split Modes

Mode	Description
`range`	Extract pages as single PDF
`individual`	Split each page into separate PDF

Performance: Less than 2s | Cost: Free

Merge PDF Node

Combine multiple PDFs into a single document.

Package: @uploadista/flow-documents-nodes

import { createMergePdfNode } from "@uploadista/flow-documents-nodes";

// Merge PDFs with default settings
const mergeNode = yield* createMergePdfNode("merge-1");

// With custom naming
const namedMergeNode = yield* createMergePdfNode("merge-2", {
  naming: { mode: "auto" },
});

Parameters

Parameter	Type	Required	Default	Description
`inputCount`	`number`	No	-	Expected number of input files
`keepOutput`	`boolean`	No	`false`	Keep output in flow results
`naming`	`FileNamingConfig`	No	-	File naming (auto suffix: `merged`)

Note: Requires a Merge utility node upstream to provide multiple files.

Performance: Less than 2s | Cost: Free

Describe Document Node

Extract PDF metadata (page count, author, title, etc.).

Package: @uploadista/flow-documents-nodes

import { createDescribeDocumentNode } from "@uploadista/flow-documents-nodes";

// Extract document metadata
const describeNode = yield* createDescribeDocumentNode("describe-1");

// With keepOutput option
const keepOutputNode = yield* createDescribeDocumentNode("describe-2", {
  keepOutput: true,
});

Parameters

Parameter	Type	Required	Default	Description
`keepOutput`	`boolean`	No	`false`	Keep output in flow results

Output Metadata:

{
  "pageCount": 10,
  "format": "pdf",
  "author": "John Doe",
  "title": "Document Title",
  "subject": "Document Subject",
  "creator": "Adobe Acrobat",
  "creationDate": "2023-01-01T00:00:00Z",
  "modifiedDate": "2023-01-02T00:00:00Z",
  "fileSize": 1024000
}

Performance: Less than 1s | Cost: Free

Convert to Markdown Node

Intelligent document-to-markdown conversion.

Package: @uploadista/flow-documents-nodes

import { createConvertToMarkdownNode } from "@uploadista/flow-documents-nodes";

// Convert with default settings
const convertNode = yield* createConvertToMarkdownNode("convert-1");

// With custom resolution and credential
const customNode = yield* createConvertToMarkdownNode("convert-2", {
  resolution: "gundam",
  credentialId: "my-ai-credential",
});

Parameters

Parameter	Type	Required	Default	Description
`resolution`	`"tiny" \| "small" \| "base" \| "gundam" \| "large"`	No	`"gundam"`	OCR model resolution
`credentialId`	`string`	No	-	AI service credential ID
`keepOutput`	`boolean`	No	`false`	Keep output in flow results

How it Works:

Tries text extraction first (fast, for searchable PDFs)
Falls back to OCR if no text found (for scanned PDFs)
Returns structured markdown in metadata.markdown

Performance Summary

Node	Time	Cost	Plugin Required
Extract Text	<1s	Free	Unpdf
Split PDF	<2s	Free	PDFLib
Merge PDF	<2s	Free	PDFLib
Describe Document	<1s	Free	PDFLib
OCR	5-15s	~$0.005	Replicate
Convert to Markdown	1-15s	Free or ~$0.005	Unpdf + Replicate

Plugins Concept - How plugins work
Flow Nodes Overview - All available nodes

Document Nodes

Required Plugins

Available Plugins

Installation

Plugin Comparison

OCR Node

Parameters

Task Types

Extract Text Node

Parameters

Split PDF Node

Parameters

Split Modes

Merge PDF Node

Parameters

Describe Document Node

Parameters

Convert to Markdown Node

Parameters

Performance Summary

Related