Why teams regret choosing wkhtmltopdf a year later
wkhtmltopdf looks like a fast win for converting HTML to PDFs, until the build breaks, the rendering drifts, and ops is babysitting a leaking binary. Here’s why teams regret that choice after twelve months.
Sir QuackalotThe initial win: ship PDFs fast
There’s a familiar arc: a product needs PDFs, engineering time is scarce, and wkhtmltopdf is a one-binary promise. It takes HTML you already have, outputs a PDF, and developers can wire it up in an afternoon. No React-to-PDF library to fight, no designer handoffs — just an executable invoked from your server or job worker. That initial velocity is real. Everyone recognizes it.
The cracks appear: maintenance, scale, and security
Fast-forward a few months. The team ships more templates, adds fonts, and starts automating exports. Now the problems you didn't see at demo time show up: rendering differences across releases, random crashes, slow cold starts in containers, font fallbacks that look wrong on production PDFs, and security advisories for an old Qt dependency. The operational cost is not one weekend upgrade — it’s a recurring tax. You stop thinking of wkhtmltopdf as a tool and start thinking of it as a component your SREs must babysit.
Real failure modes — a concrete example
Concrete example — how a typical failure unfolds.
Architecture (simplified):
- Rails app → enqueue PDF job to Sidekiq
- Sidekiq job runs in a Kubernetes pod and shells out to wkhtmltopdf inside the pod
- Result written to S3, URL returned to user
What breaks:
- Memory leak in wkhtmltopdf process after rendering dozens of complex invoices. Pod OOMs and Kubernetes restarts the worker.
- A font substitution bug: the font used in dev is available system-wide but not packaged in the container. Production PDFs replace it with a fallback, shifting layout and overflowing pages.
- A silent crash: wkhtmltopdf exits with status 127 in some pods because a system dependency (libXrender or patched Qt) isn’t installed in the base image.
Example job pseudocode (what teams run in production):
spawn_pdf(job) { html = render_template(job.data) tmpfile = /tmp/job-#{job.id}.html write(tmpfile, html) // blocking system call system("/usr/local/bin/wkhtmltopdf --print-media-type #{tmpfile} /tmp/job-#{job.id}.pdf") upload_to_s3("/tmp/job-#{job.id}.pdf") }
Failure mode trace:
- Sidekiq worker logs: "child killed by signal 9"
- Pod events: OOMKilled
- Reproduced locally: running the same job 50 times grows RSS until the process dies
You can patch around this with restart policies, smaller concurrency, extra monitoring, or GC-ing the container between jobs — but that adds complexity and cost. And it doesn’t fix subtle rendering drift when wkhtmltopdf or system fonts change.
Checklist
- Pin and test your base image and font packages; treat the image like code.
- Run renders at volume early to catch memory leaks and slow templates.
- Use a separate worker fleet or service for PDF jobs; never block web workers.
- Add PDF visual regression tests that tolerate metadata differences (use normalized comparisons).
- Plan for upgrades: smoke render on every release of the binary or OS layer.
Closing
Picking wkhtmltopdf isn’t wrong for prototypes. But treating it like a maintenance-free dependency is a mistake teams regret after a year when crashes, font issues, and ops cost pile up. If you want to avoid babysitting rendering binaries, try an API-first PDF service — or be prepared to run one yourself. (If you want a hands-off option, DuckSlide exists.)