Medical Form Designer

The Challenge

A clinical EMR accumulates dozens of paper-derived forms — consent, anamnesis, referral, regulatory filings, lab orders. In the legacy system each one is a hand-coded form component, so every new or changed form goes through the engineering queue: a developer writes code, it gets reviewed, built, and deployed. Non-developers can't author forms, competitors already ship visual designers, and the hand-coded components are brittle and hard to maintain.

Medical forms are printed and filed, so output must be 1:1 on A4 with legally-mandated layouts that don't tolerate drift. Authoring has to be accessible to product and clinical staff, not just engineers. A dedicated operator was re-drawing forms from PDF by hand at roughly 8-10 forms per day, with complex forms taking about an hour each. The goal: let non-developers author forms by uploading a PDF or sketching on a visual canvas, producing a schema that renders everywhere in the EMR.

Key Constraints

Print fidelity: 1:1 A4 output, DPI-independent, legally-mandated layouts for regulatory forms
Authoring by non-developers — no JSON editing, precise but not intimidating canvas
LLM vision is semantically strong but spatially weak on dense tabular layouts
Dozens of legacy hand-coded form components targeted for replacement
One schema must drive view / input / print contexts everywhere in the EMR
Library-first design for cross-product reuse (EMR today, ERP next)

Our Approach

Built a schema-driven, library-first authoring package where a single JSON schema is the source of truth for a visual editor, a passive renderer, and the AI layer. A non-developer drops a PDF; an LLM-vision pipeline emits structured tool calls against a typed block catalog, each validated before composition. A 2D bin-packing composer assembles the blocks and fits them to the page. Output is mm-native for pixel-stable A4 printing, and forms persist as versioned schemas in the database rather than as code.

Key Technical Decisions

Tool-use over free-form JSON - the model calls typed block functions instead of emitting a full schema, so it can't invent invalid structures and every call is validated before anything reaches the renderer
AI emits content + relative ratios; code resolves absolute millimeters - sidesteps the LLM's spatial-precision ceiling, so a ratio error degrades to sub-millimeter padding instead of a half-page overflow
Two extraction modes - Mirror reproduces a regulatory layout 1:1, Adapted re-authors a form into the design system; same pipeline, different prompt and fit target
AI for structure, OCR-overlay for pixel-perfect regulatory reproduction - picking the tool by goal rather than hype
Deleted a multi-layer post-fix pipeline (~3k LoC) once the layering itself proved to be the problem - replaced with typed block functions, and output quality went up the same day
mm-native layout, not px - browser printing is mm-stable, giving DPI-independent 1:1 A4 without media-query forking
DOM, not Canvas - real inputs keep native accessibility, keyboard navigation, input methods, and print for free
Schema versioned in the database, not the codebase - adding a form is a data write, not a deploy, enabling rollback and per-clinic variation

Timeline: ~3 months from POC to MVP. Visual builder POC, pivot to an mm-grid canvas with direct-DOM drag, multi-page and undo/redo, then the AI extraction layer, the two extraction modes, and comment-driven region patches.

Implementation

Phase 1: Visual Designer Foundation

mm-native grid canvas on A4 with 1mm precision, drag/resize that bypasses React mid-gesture for smooth interaction, multi-select with group operations, undo/redo with a capped history, and a table primitive with browser-measured row heights. The JSON schema is the single source of truth feeding the editor, the renderer, and the AI tool-spec.

~4-5 weeks

Phase 2: AI Extraction Layer

A typed block catalog is exposed to LLM vision as a tool list; the model emits structured tool calls that are validated per-block against JSON Schema before composition. A 2D bin-packing composer assembles the blocks, and a slack-distribution post-process fits the result to the page by distributing the gap between growable blocks and inside table cells.

~4-6 weeks

Phase 3: Modes & Comment-Driven Patches

Mirror vs Adapted extraction modes driven by the system prompt and fit target. A third editing path beyond mouse-edit and PDF-extract: the user selects a region, writes a natural-language instruction, and the AI re-emits only that region while the rest of the schema stays byte-identical — rolled back as a single undo step.

~3-4 weeks

Technology Stack

React 18TypeScriptmm-native grid canvasZero runtime depsClaude (vision)GPT-5 (vision)Tool-use / structured outputPer-block JSON Schema validationSchema-driven renderer2D bin-packing composerPostgreSQL JSONB (versioned schemas)

Results & Impact

4-6xThroughput Lift

Form authoring from ~1 hour (manual PDF re-draw) to 10-15 minutes (AI extract + polish) on complex forms

Same dayNew-Form Lead Time

From several days to a week (developer hand-codes, reviews, deploys) to same-day authoring with no deploy

~70Components Replaced

Legacy hand-coded form components targeted for replacement by a single schema-driven renderer

~75%First-Pass Fidelity

Visual fidelity of AI extraction on dense regulatory forms, reviewed and polished from there

8 blocksTool Catalog

Typed block functions form the AI's structured-output surface, each with a tight JSON Schema input

~$0.04Cost per Extract

Per-form extraction cost on a fast vision model, low enough to iterate freely

Productivity: 4-6x faster form authoring, with the role shifting from manual transcription to schema review
Process: forms move out of the engineering backlog and into the product team's own hands
Maintainability: forms become versioned data, not code — rollback and per-clinic variation are a write, not a deploy
Reuse: library-first design enables cross-product reuse (EMR to ERP) with one engine and one schema
Accessibility: non-developers author and edit forms without touching code
Print fidelity: DPI-independent 1:1 A4 output for legally-mandated layouts

What We Learned

The third fix in the same layer is an architectural signal, not a bug - deleting the post-fix pipeline improved output the same day it died.
Tool-use beats free-form JSON for structured output - typed functions give a validation point and a smaller, sharper concept surface than a 30-type schema in the prompt.
AI is strong at structure, weak at exact position - emit relative ratios and resolve them to absolute millimeters in code.
When an LLM follows a prompt rule literally and produces broken output, the schema is missing a primitive - extend the data shape instead of piling on guard-rails.
Pick AI vs OCR-overlay by goal - editable, data-bound forms want AI extraction; pixel-perfect regulatory reproduction wants OCR-overlay.
Library-first pays off later - a schema-only contract with zero host-specific code lets the same engine lift productivity across new products without re-implementation.