How I wrote a PDF editor that really edit a PDF
The goal was simple: click on a word in a PDF and change it, like you would in Notepad. Two hours later I had learned more about coordinate systems than I ever wanted to know.
Why editing a PDF is hard
A PDF is not a document. It’s a set of print instructions. Each character is placed at absolute coordinates on the page, with no concept of “word”, “paragraph”, or “reflow”. When you open a PDF in a text editor you see something like:
q 0.75 0 0 -0.75 72 770 cm 1 0 0 -1 147.47 13.76 cm BT /F1 14.67 Tf 0 0 Td (i) Tj ET Q
That’s one letter. Every letter has its own transformation matrix.
The first approach (broken)
I used PDFI.js, an open source library by Photopea. It reads a PDF and calls methods on a Writer object for every graphic operation: StartPage, PutText, Fill, Stroke, PutImage, ShowPage. You provide the Writer, so you decide what to do with the data.
The pipeline was:
- Feed the original PDF to
PDFI.ParsewithToPDFas the Writer ToPDFregenerates a clean, normalized PDF buffer- Run regex on the buffer to find and replace the target word
- Shift the X coordinates of all subsequent characters on the same line by
delta = (newLength - oldLength) * charWidth
This worked. Replacing “ciao” with “ciaone” produced a correct PDF.
The editing UI (first attempt)
Inspired by how pdf.js handles text selection, the UI uses two layers on top of each other. A canvas layer where ToContext2D renders the page visually, and a transparent div overlaid on top containing one <span> per word, positioned exactly over the rendered text.
When you click a span, it hides itself and shows an <input> in its place with the same font size and position. You type the new word, press Enter, the input disappears, the canvas updates.
The span approach means no hit-testing on pixel coordinates. You’re clicking real DOM elements, exactly like pdf.js does.
The coordinate problem
This is where things fell apart.
There are three coordinate systems in play. PDF space has its origin at bottom-left with the Y axis pointing up, so a typical text position like (147, 13) means 13 points from the bottom of an 842-point tall page. Canvas physical pixels get flipped by ToContext2D internally via ctx.translate(0, h*scale); ctx.scale(scale, -scale), turning a PDF coordinate (x, y) into physical pixel (x*scale, pageHeight*scale - y*scale). Then CSS pixels add another layer because the browser scales the canvas by devicePixelRatio. On this machine dpr = 0.9, which is less than 1, meaning CSS pixels are larger than physical pixels and dividing by 0.9 makes coordinates bigger, not smaller.
The text rendered by ToContext2D ends up near the top of the canvas (correct). The spans computed from the same PDF coordinates end up near the bottom (wrong). Every formula I tried to convert between the three systems produced a different wrong answer.
What worked: loading and rendering a PDF, extracting word positions, replacing a word with basic reflow, showing an editable input on click, generating a modified PDF blob.
What didn’t: span positions misaligned with the canvas, canvas not re-rendering correctly after a commit, sequential edits accumulating offset errors.
The second approach: CrabPDF
Instead of fighting PDFI.js’s coordinate pipeline, I started over with a cleaner split: pdf.js for reading and rendering, pdf-lib for writing.
The key insight is that pdf.js’s getTextContent() already returns text items in viewport space, the same coordinate system the canvas uses. No manual coordinate conversion needed.
Extraction and rendering. pdf.js renders each page onto a <canvas> at scale = 1.5. In parallel, getTextContent() returns every text item with its position, size, font name, and string. These go into a flat array of textItems.
var tx = pdfjsLib.Util.transform(viewport.transform, item.transform);
var x = tx[4], y = tx[5], fs = Math.abs(tx[3]), w = item.width * scale;The invisible text layer. A transparent <div> sits over the canvas. Each textItem gets a <span> at the exact same pixel coordinates with color: transparent and pointer-events: all. Hover highlights it, double-click opens an inline <input>.
Writing back to the PDF. On commit, pdf-lib loads the current PDF bytes, draws a white rectangle over the original text (the whiteout trick), then redraws the new string at the same position.
page.drawRectangle({ x: pdfX-1, y: pdfY-1, width: ..., height: ..., color: PDFLib.rgb(1,1,1) });
page.drawText(newStr, { x: pdfX, y: pdfY, size: fontSize, font: font, color: pdfColor });The coordinate conversion is clean because both systems share the same origin once you account for the Y-flip: pdfY = pageHeight - (canvasY + itemHeight) / scale.
Undo/redo. Every operation snapshots the full PDF bytes and the textItems array. Undo restores the previous snapshot and re-renders. Not efficient for large files, but correct and simple.
Multi-line group editing. Shift-drag draws a selection rectangle. Matched text items are clustered into groups using either a proximity-based union-find algorithm or a strict line-detection algorithm. The group gets an orange overlay and double-click opens a <textarea> for multi-line editing.
What still doesn’t work
Font matching is approximate. pdf.js reports the embedded font family name (after stripping the 6-char subset prefix), but pdf-lib can only embed standard fonts or custom TTFs. If the original PDF uses a proprietary font, the replacement text will look different. The whiteout rectangle also sometimes leaves a faint border on screen, a canvas rendering artifact. And there’s no reflow yet, so editing a long paragraph still requires manual line breaks.
You can try it at crabpdf.com, free, no uploads, no backend, everything runs in the browser.
What I learned
Modifying a PDF is not like editing a text file. The format was designed for faithful reproduction, not editing. Every tool that does it well works either directly with the binary format at a very low level, or converts the PDF into an internal editable model and rebuilds it on export.
The pdf.js + pdf-lib split works better than the regex-on-buffer approach precisely because it never tries to mutate the binary stream. It reads cleanly with one library and writes cleanly with another. The coordinate alignment problem, which killed the first attempt, disappears entirely when you let pdf.js own the viewport transform and only convert back to PDF space at the last moment before writing.
