Skip to content

Document Nodes

Process PDFs and documents with text extraction, OCR, splitting, merging, and conversion.

PluginPackageOperationsRequirements
documentPlugin@uploadista/flow-documents-pluginSplit, merge, metadata, text extractionNone
pdfLibDocumentPlugin@uploadista/flow-documents-pdflibSplit, merge, metadata onlyNone
unpdfDocumentPlugin@uploadista/flow-documents-unpdfText extraction onlyNone
documentAiPlugin@uploadista/flow-documents-replicateOCR, AI extractionReplicate API token
npm install @uploadista/flow-documents-nodes @uploadista/flow-documents-plugin
import { createUploadistaServer } from "@uploadista/server";
import { documentPlugin } from "@uploadista/flow-documents-plugin";
const uploadista = await createUploadistaServer({
// ...
plugins: [
documentPlugin,
],
});

Features: Split PDF, merge PDFs, extract metadata, text extraction

Requirements: None (pure JavaScript)

Note: This is the recommended plugin - it combines PDFLib and Unpdf functionality.

FeatureCombinedPDFLibUnpdfReplicate
Split PDFYesYesNoNo
Merge PDFsYesYesNoNo
Extract metadataYesYesNoNo
Extract text (searchable)YesNoYesYes
OCR (scanned)NoNoNoYes
Convert to markdownNoNoNoYes
CostFreeFreeFreePer-request

AI-powered text extraction from scanned documents.

Package: @uploadista/flow-documents-nodes

import { createOcrNode } from "@uploadista/flow-documents-nodes";
// Convert scanned document to markdown
const ocrNode = yield* createOcrNode("ocr-1", {
taskType: "convertToMarkdown",
resolution: "gundam",
credentialId: "my-replicate-credential",
});
// Free-form OCR for plain text extraction
const freeOcrNode = yield* createOcrNode("ocr-2", {
taskType: "freeOcr",
resolution: "base",
});
// Locate specific content in document
const locateNode = yield* createOcrNode("ocr-3", {
taskType: "locateObject",
referenceText: "Invoice Total",
resolution: "small",
});
ParameterTypeRequiredDefaultDescription
taskType"convertToMarkdown" | "freeOcr" | "parseFigure" | "locateObject"Yes-OCR task type
resolution"tiny" | "small" | "base" | "gundam" | "large"No-Model resolution (higher = better quality)
credentialIdstringNo-AI service credential ID
referenceTextstringNo-Text to locate (for locateObject task)
keepOutputbooleanNofalseKeep output in flow results
Task TypeDescription
convertToMarkdownStructured markdown output with headings, lists
freeOcrUnstructured plain text extraction
parseFigureAnalyze charts and diagrams
locateObjectFind specific content using reference text

Performance: ~5-15s | Cost: ~$0.005 per document


Fast text extraction from searchable PDFs.

Package: @uploadista/flow-documents-nodes

import { createExtractTextNode } from "@uploadista/flow-documents-nodes";
// Extract text from searchable PDF
const extractNode = yield* createExtractTextNode("extract-1");
// With keepOutput option
const keepOutputNode = yield* createExtractTextNode("extract-2", {
keepOutput: true,
});
ParameterTypeRequiredDefaultDescription
keepOutputbooleanNofalseKeep output in flow results

Output: Adds extractedText to file metadata.

Note: For scanned PDFs without selectable text, use OCR Node instead.

Performance: Less than 1s | Cost: Free


Split PDFs by page range or into individual pages.

Package: @uploadista/flow-documents-nodes

import { createSplitPdfNode } from "@uploadista/flow-documents-nodes";
// Extract pages 3-5 as single PDF
const rangeNode = yield* createSplitPdfNode("split-1", {
mode: "range",
startPage: 3,
endPage: 5,
});
// Split each page into separate PDF
const individualNode = yield* createSplitPdfNode("split-2", {
mode: "individual",
});
// With custom naming
const namedNode = yield* createSplitPdfNode("split-3", {
mode: "range",
startPage: 1,
endPage: 10,
naming: { mode: "auto" },
});
ParameterTypeRequiredDefaultDescription
mode"range" | "individual"Yes-Split mode
startPagenumberNo-Start page (for range mode)
endPagenumberNo-End page (for range mode)
keepOutputbooleanNofalseKeep output in flow results
namingFileNamingConfigNo-File naming configuration
ModeDescription
rangeExtract pages as single PDF
individualSplit each page into separate PDF

Performance: Less than 2s | Cost: Free


Combine multiple PDFs into a single document.

Package: @uploadista/flow-documents-nodes

import { createMergePdfNode } from "@uploadista/flow-documents-nodes";
// Merge PDFs with default settings
const mergeNode = yield* createMergePdfNode("merge-1");
// With custom naming
const namedMergeNode = yield* createMergePdfNode("merge-2", {
naming: { mode: "auto" },
});
ParameterTypeRequiredDefaultDescription
inputCountnumberNo-Expected number of input files
keepOutputbooleanNofalseKeep output in flow results
namingFileNamingConfigNo-File naming (auto suffix: merged)

Note: Requires a Merge utility node upstream to provide multiple files.

Performance: Less than 2s | Cost: Free


Extract PDF metadata (page count, author, title, etc.).

Package: @uploadista/flow-documents-nodes

import { createDescribeDocumentNode } from "@uploadista/flow-documents-nodes";
// Extract document metadata
const describeNode = yield* createDescribeDocumentNode("describe-1");
// With keepOutput option
const keepOutputNode = yield* createDescribeDocumentNode("describe-2", {
keepOutput: true,
});
ParameterTypeRequiredDefaultDescription
keepOutputbooleanNofalseKeep output in flow results

Output Metadata:

{
"pageCount": 10,
"format": "pdf",
"author": "John Doe",
"title": "Document Title",
"subject": "Document Subject",
"creator": "Adobe Acrobat",
"creationDate": "2023-01-01T00:00:00Z",
"modifiedDate": "2023-01-02T00:00:00Z",
"fileSize": 1024000
}

Performance: Less than 1s | Cost: Free


Intelligent document-to-markdown conversion.

Package: @uploadista/flow-documents-nodes

import { createConvertToMarkdownNode } from "@uploadista/flow-documents-nodes";
// Convert with default settings
const convertNode = yield* createConvertToMarkdownNode("convert-1");
// With custom resolution and credential
const customNode = yield* createConvertToMarkdownNode("convert-2", {
resolution: "gundam",
credentialId: "my-ai-credential",
});
ParameterTypeRequiredDefaultDescription
resolution"tiny" | "small" | "base" | "gundam" | "large"No"gundam"OCR model resolution
credentialIdstringNo-AI service credential ID
keepOutputbooleanNofalseKeep output in flow results

How it Works:

  1. Tries text extraction first (fast, for searchable PDFs)
  2. Falls back to OCR if no text found (for scanned PDFs)
  3. Returns structured markdown in metadata.markdown

NodeTimeCostPlugin Required
Extract Text<1sFreeUnpdf
Split PDF<2sFreePDFLib
Merge PDF<2sFreePDFLib
Describe Document<1sFreePDFLib
OCR5-15s~$0.005Replicate
Convert to Markdown1-15sFree or ~$0.005Unpdf + Replicate