The Challenge
A clinical EMR accumulates dozens of paper-derived forms — consent, anamnesis, referral, regulatory filings, lab orders. In the legacy system each one is a hand-coded form component, so every new or changed form goes through the engineering queue: a developer writes code, it gets reviewed, built, and deployed. Non-developers can't author forms, competitors already ship visual designers, and the hand-coded components are brittle and hard to maintain.
Medical forms are printed and filed, so output must be 1:1 on A4 with legally-mandated layouts that don't tolerate drift. Authoring has to be accessible to product and clinical staff, not just engineers. A dedicated operator was re-drawing forms from PDF by hand at roughly 8-10 forms per day, with complex forms taking about an hour each. The goal: let non-developers author forms by uploading a PDF or sketching on a visual canvas, producing a schema that renders everywhere in the EMR.
Key Constraints
- Print fidelity: 1:1 A4 output, DPI-independent, legally-mandated layouts for regulatory forms
- Authoring by non-developers — no JSON editing, precise but not intimidating canvas
- LLM vision is semantically strong but spatially weak on dense tabular layouts
- Dozens of legacy hand-coded form components targeted for replacement
- One schema must drive view / input / print contexts everywhere in the EMR
- Library-first design for cross-product reuse (EMR today, ERP next)
Our Approach
Built a schema-driven, library-first authoring package where a single JSON schema is the source of truth for a visual editor, a passive renderer, and the AI layer. A non-developer drops a PDF; an LLM-vision pipeline emits structured tool calls against a typed block catalog, each validated before composition. A 2D bin-packing composer assembles the blocks and fits them to the page. Output is mm-native for pixel-stable A4 printing, and forms persist as versioned schemas in the database rather than as code.
Key Technical Decisions
- Tool-use over free-form JSON - the model calls typed block functions instead of emitting a full schema, so it can't invent invalid structures and every call is validated before anything reaches the renderer
- AI emits content + relative ratios; code resolves absolute millimeters - sidesteps the LLM's spatial-precision ceiling, so a ratio error degrades to sub-millimeter padding instead of a half-page overflow
- Two extraction modes - Mirror reproduces a regulatory layout 1:1, Adapted re-authors a form into the design system; same pipeline, different prompt and fit target
- AI for structure, OCR-overlay for pixel-perfect regulatory reproduction - picking the tool by goal rather than hype
- Deleted a multi-layer post-fix pipeline (~3k LoC) once the layering itself proved to be the problem - replaced with typed block functions, and output quality went up the same day
- mm-native layout, not px - browser printing is mm-stable, giving DPI-independent 1:1 A4 without media-query forking
- DOM, not Canvas - real inputs keep native accessibility, keyboard navigation, input methods, and print for free
- Schema versioned in the database, not the codebase - adding a form is a data write, not a deploy, enabling rollback and per-clinic variation
Timeline: ~3 months from POC to MVP. Visual builder POC, pivot to an mm-grid canvas with direct-DOM drag, multi-page and undo/redo, then the AI extraction layer, the two extraction modes, and comment-driven region patches.
Implementation
Phase 1: Visual Designer Foundation
mm-native grid canvas on A4 with 1mm precision, drag/resize that bypasses React mid-gesture for smooth interaction, multi-select with group operations, undo/redo with a capped history, and a table primitive with browser-measured row heights. The JSON schema is the single source of truth feeding the editor, the renderer, and the AI tool-spec.
~4-5 weeksPhase 2: AI Extraction Layer
A typed block catalog is exposed to LLM vision as a tool list; the model emits structured tool calls that are validated per-block against JSON Schema before composition. A 2D bin-packing composer assembles the blocks, and a slack-distribution post-process fits the result to the page by distributing the gap between growable blocks and inside table cells.
~4-6 weeksPhase 3: Modes & Comment-Driven Patches
Mirror vs Adapted extraction modes driven by the system prompt and fit target. A third editing path beyond mouse-edit and PDF-extract: the user selects a region, writes a natural-language instruction, and the AI re-emits only that region while the rest of the schema stays byte-identical — rolled back as a single undo step.
~3-4 weeksTechnology Stack
Results & Impact
Form authoring from ~1 hour (manual PDF re-draw) to 10-15 minutes (AI extract + polish) on complex forms
From several days to a week (developer hand-codes, reviews, deploys) to same-day authoring with no deploy
Legacy hand-coded form components targeted for replacement by a single schema-driven renderer
Visual fidelity of AI extraction on dense regulatory forms, reviewed and polished from there
Typed block functions form the AI's structured-output surface, each with a tight JSON Schema input
Per-form extraction cost on a fast vision model, low enough to iterate freely
- Productivity: 4-6x faster form authoring, with the role shifting from manual transcription to schema review
- Process: forms move out of the engineering backlog and into the product team's own hands
- Maintainability: forms become versioned data, not code — rollback and per-clinic variation are a write, not a deploy
- Reuse: library-first design enables cross-product reuse (EMR to ERP) with one engine and one schema
- Accessibility: non-developers author and edit forms without touching code
- Print fidelity: DPI-independent 1:1 A4 output for legally-mandated layouts
What We Learned
- The third fix in the same layer is an architectural signal, not a bug - deleting the post-fix pipeline improved output the same day it died.
- Tool-use beats free-form JSON for structured output - typed functions give a validation point and a smaller, sharper concept surface than a 30-type schema in the prompt.
- AI is strong at structure, weak at exact position - emit relative ratios and resolve them to absolute millimeters in code.
- When an LLM follows a prompt rule literally and produces broken output, the schema is missing a primitive - extend the data shape instead of piling on guard-rails.
- Pick AI vs OCR-overlay by goal - editable, data-bound forms want AI extraction; pixel-perfect regulatory reproduction wants OCR-overlay.
- Library-first pays off later - a schema-only contract with zero host-specific code lets the same engine lift productivity across new products without re-implementation.





