Document Nodes
Process PDFs and documents with text extraction, OCR, splitting, merging, and conversion.
Required Plugins
Section titled “Required Plugins”Available Plugins
Section titled “Available Plugins”| Plugin | Package | Operations | Requirements |
|---|---|---|---|
| documentPlugin | @uploadista/flow-documents-plugin | Split, merge, metadata, text extraction | None |
| pdfLibDocumentPlugin | @uploadista/flow-documents-pdflib | Split, merge, metadata only | None |
| unpdfDocumentPlugin | @uploadista/flow-documents-unpdf | Text extraction only | None |
| documentAiPlugin | @uploadista/flow-documents-replicate | OCR, AI extraction | Replicate API token |
Installation
Section titled “Installation”npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-pluginpnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-pluginyarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-pluginimport { createUploadistaServer } from "@uploadista/server";import { documentPlugin } from "@uploadista/flow-documents-plugin";
const uploadista = await createUploadistaServer({ // ... plugins: [ documentPlugin, ],});Features: Split PDF, merge PDFs, extract metadata, text extraction
Requirements: None (pure JavaScript)
Note: This is the recommended plugin - it combines PDFLib and Unpdf functionality.
npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-pdflibpnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-pdflibyarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-pdflibimport { createUploadistaServer } from "@uploadista/server";import { pdfLibDocumentPlugin } from "@uploadista/flow-documents-pdflib";
const uploadista = await createUploadistaServer({ // ... plugins: [ pdfLibDocumentPlugin, ],});Features: Split PDF, merge PDFs, extract metadata
Requirements: None (pure JavaScript)
npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-unpdfpnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-unpdfyarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-unpdfimport { createUploadistaServer } from "@uploadista/server";import { unpdfDocumentPlugin } from "@uploadista/flow-documents-unpdf";
const uploadista = await createUploadistaServer({ // ... plugins: [ unpdfDocumentPlugin, ],});Features: Fast text extraction from searchable PDFs
Requirements: None
Note: Only works with searchable PDFs. For scanned documents, use OCR.
npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-replicatepnpm add @uploadista/flow-documents-nodes @uploadista/flow-documents-replicateyarn add @uploadista/flow-documents-nodes @uploadista/flow-documents-replicateimport { createUploadistaServer } from "@uploadista/server";import { documentAiPlugin } from "@uploadista/flow-documents-replicate";
const uploadista = await createUploadistaServer({ // ... plugins: [ documentAiPlugin({ apiToken: process.env.REPLICATE_API_TOKEN }), ],});Features: OCR, AI text extraction, markdown conversion
Requirements: Replicate API token
Cost: ~$0.005 per document
Plugin Comparison
Section titled “Plugin Comparison”| Feature | Combined | PDFLib | Unpdf | Replicate |
|---|---|---|---|---|
| Split PDF | Yes | Yes | No | No |
| Merge PDFs | Yes | Yes | No | No |
| Extract metadata | Yes | Yes | No | No |
| Extract text (searchable) | Yes | No | Yes | Yes |
| OCR (scanned) | No | No | No | Yes |
| Convert to markdown | No | No | No | Yes |
| Cost | Free | Free | Free | Per-request |
OCR Node
Section titled “OCR Node”AI-powered text extraction from scanned documents.
Package: @uploadista/flow-documents-nodes
import { createOcrNode } from "@uploadista/flow-documents-nodes";
// Convert scanned document to markdownconst ocrNode = yield* createOcrNode("ocr-1", { taskType: "convertToMarkdown", resolution: "gundam", credentialId: "my-replicate-credential",});
// Free-form OCR for plain text extractionconst freeOcrNode = yield* createOcrNode("ocr-2", { taskType: "freeOcr", resolution: "base",});
// Locate specific content in documentconst locateNode = yield* createOcrNode("ocr-3", { taskType: "locateObject", referenceText: "Invoice Total", resolution: "small",});Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
taskType | "convertToMarkdown" | "freeOcr" | "parseFigure" | "locateObject" | Yes | - | OCR task type |
resolution | "tiny" | "small" | "base" | "gundam" | "large" | No | - | Model resolution (higher = better quality) |
credentialId | string | No | - | AI service credential ID |
referenceText | string | No | - | Text to locate (for locateObject task) |
keepOutput | boolean | No | false | Keep output in flow results |
Task Types
Section titled “Task Types”| Task Type | Description |
|---|---|
convertToMarkdown | Structured markdown output with headings, lists |
freeOcr | Unstructured plain text extraction |
parseFigure | Analyze charts and diagrams |
locateObject | Find specific content using reference text |
Performance: ~5-15s | Cost: ~$0.005 per document
Extract Text Node
Section titled “Extract Text Node”Fast text extraction from searchable PDFs.
Package: @uploadista/flow-documents-nodes
import { createExtractTextNode } from "@uploadista/flow-documents-nodes";
// Extract text from searchable PDFconst extractNode = yield* createExtractTextNode("extract-1");
// With keepOutput optionconst keepOutputNode = yield* createExtractTextNode("extract-2", { keepOutput: true,});Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
keepOutput | boolean | No | false | Keep output in flow results |
Output: Adds extractedText to file metadata.
Note: For scanned PDFs without selectable text, use OCR Node instead.
Performance: Less than 1s | Cost: Free
Split PDF Node
Section titled “Split PDF Node”Split PDFs by page range or into individual pages.
Package: @uploadista/flow-documents-nodes
import { createSplitPdfNode } from "@uploadista/flow-documents-nodes";
// Extract pages 3-5 as single PDFconst rangeNode = yield* createSplitPdfNode("split-1", { mode: "range", startPage: 3, endPage: 5,});
// Split each page into separate PDFconst individualNode = yield* createSplitPdfNode("split-2", { mode: "individual",});
// With custom namingconst namedNode = yield* createSplitPdfNode("split-3", { mode: "range", startPage: 1, endPage: 10, naming: { mode: "auto" },});Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
mode | "range" | "individual" | Yes | - | Split mode |
startPage | number | No | - | Start page (for range mode) |
endPage | number | No | - | End page (for range mode) |
keepOutput | boolean | No | false | Keep output in flow results |
naming | FileNamingConfig | No | - | File naming configuration |
Split Modes
Section titled “Split Modes”| Mode | Description |
|---|---|
range | Extract pages as single PDF |
individual | Split each page into separate PDF |
Performance: Less than 2s | Cost: Free
Merge PDF Node
Section titled “Merge PDF Node”Combine multiple PDFs into a single document.
Package: @uploadista/flow-documents-nodes
import { createMergePdfNode } from "@uploadista/flow-documents-nodes";
// Merge PDFs with default settingsconst mergeNode = yield* createMergePdfNode("merge-1");
// With custom namingconst namedMergeNode = yield* createMergePdfNode("merge-2", { naming: { mode: "auto" },});Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
inputCount | number | No | - | Expected number of input files |
keepOutput | boolean | No | false | Keep output in flow results |
naming | FileNamingConfig | No | - | File naming (auto suffix: merged) |
Note: Requires a Merge utility node upstream to provide multiple files.
Performance: Less than 2s | Cost: Free
Describe Document Node
Section titled “Describe Document Node”Extract PDF metadata (page count, author, title, etc.).
Package: @uploadista/flow-documents-nodes
import { createDescribeDocumentNode } from "@uploadista/flow-documents-nodes";
// Extract document metadataconst describeNode = yield* createDescribeDocumentNode("describe-1");
// With keepOutput optionconst keepOutputNode = yield* createDescribeDocumentNode("describe-2", { keepOutput: true,});Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
keepOutput | boolean | No | false | Keep output in flow results |
Output Metadata:
{ "pageCount": 10, "format": "pdf", "author": "John Doe", "title": "Document Title", "subject": "Document Subject", "creator": "Adobe Acrobat", "creationDate": "2023-01-01T00:00:00Z", "modifiedDate": "2023-01-02T00:00:00Z", "fileSize": 1024000}Performance: Less than 1s | Cost: Free
Convert to Markdown Node
Section titled “Convert to Markdown Node”Intelligent document-to-markdown conversion.
Package: @uploadista/flow-documents-nodes
import { createConvertToMarkdownNode } from "@uploadista/flow-documents-nodes";
// Convert with default settingsconst convertNode = yield* createConvertToMarkdownNode("convert-1");
// With custom resolution and credentialconst customNode = yield* createConvertToMarkdownNode("convert-2", { resolution: "gundam", credentialId: "my-ai-credential",});Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
resolution | "tiny" | "small" | "base" | "gundam" | "large" | No | "gundam" | OCR model resolution |
credentialId | string | No | - | AI service credential ID |
keepOutput | boolean | No | false | Keep output in flow results |
How it Works:
- Tries text extraction first (fast, for searchable PDFs)
- Falls back to OCR if no text found (for scanned PDFs)
- Returns structured markdown in
metadata.markdown
Performance Summary
Section titled “Performance Summary”| Node | Time | Cost | Plugin Required |
|---|---|---|---|
| Extract Text | <1s | Free | Unpdf |
| Split PDF | <2s | Free | PDFLib |
| Merge PDF | <2s | Free | PDFLib |
| Describe Document | <1s | Free | PDFLib |
| OCR | 5-15s | ~$0.005 | Replicate |
| Convert to Markdown | 1-15s | Free or ~$0.005 | Unpdf + Replicate |
Related
Section titled “Related”- Plugins Concept - How plugins work
- Flow Nodes Overview - All available nodes