Skip to content

Why "redacted" PDFs still leak their text

6 min read May 28, 2026
pdfredactionsecurity

Drawing a black rectangle over text hides nothing. The words sit underneath, ready to be copied out by anyone who knows where to click.


You black out a name. You draw a neat dark rectangle over a Social Security number, a salary figure, a witness. The PDF looks redacted. You email it, file it, post it. And the text underneath is still sitting right there, fully intact, waiting for the first person who thinks to drag their cursor across the page.

This is one of the most common and most embarrassing data leaks there is, and it happens because of a simple misunderstanding about how PDFs are built.

TL;DR: A black box is paint on top of the page, not an eraser. The characters underneath remain in the file and can be copied, extracted, or uncovered in seconds. Real redaction deletes the text from the page, or you flatten the page into a flat image so nothing recoverable is left.

A PDF is layers, not a photo

When you look at a PDF you see a finished page. Inside the file, that page is a set of instructions. There is a content stream that says “draw the letter J at this position, then the letter o, then h, then n” along with font, size, and coordinates. Those character instructions are the real text. They are what lets you select a paragraph and copy it.

When you open the file in a viewer or a generic image editor and draw a rectangle over a word, you are not touching that content stream at all. You are adding a new object on top of it. In PDF terms it is usually an annotation or a vector shape, painted last so it lands above everything else. Visually it covers the word. Structurally the word is untouched, one layer down, still spelling itself out in full.

So the box and the text coexist. The box hides the text from your eyes. It hides nothing from the file.

How the text gets out

You do not need a forensic lab to defeat a fake redaction. There are at least three trivial ways.

Select and copy. Open the PDF, drag across the blacked-out region as if you were highlighting it, and paste into any text field. The characters under the box come right out, because selection reads the content stream, not the pixels you see.

Extract the text. Any tool that pulls text out of a PDF, including the search box in your own reader, reads the same underlying stream. Search for a name you think you hid and watch it light up under its own black box.

Delete the box. The rectangle is just another object. An editor can select it and remove it, and the original word is revealed underneath, unchanged. The “redaction” was one keystroke deep.

None of this is exotic. The text was never gone. It was only out of sight.

This keeps happening to people who should know better

This is not a beginner-only mistake. Courts, law firms, government agencies, and large companies have all shipped documents where sensitive names, addresses, and figures sat recoverable under black boxes.

In some high-profile legal filings, lawyers redacted witness names and confidential details with overlay boxes, and reporters lifted the supposedly hidden text out of the published PDF within minutes. Government bodies have released contracts and reports where pricing or personal data was “blacked out” but trivially copyable. The pattern repeats because the failure looks like success. The document appears redacted on screen, the person sees a clean black bar, and they assume the job is done. The leak is invisible until someone goes looking, and by then the file is already public.

If organizations with legal teams keep doing this, the honest takeaway is that the tool fooled them, not that they were careless. A black rectangle in the wrong workflow is a trap dressed as a feature.

What true redaction actually means

Real redaction has one requirement: the sensitive content must not exist in the file anymore. Not hidden, not covered, not behind a layer. Gone from the page’s instructions.

There are two reliable ways to get there.

Remove the content, then flatten. Proper redaction deletes the underlying text and any objects in the covered region from the content stream, then writes out a clean page. After this, selecting the area gives you nothing, because there is nothing there. Search finds nothing. There is no box to delete because the words it was hiding no longer exist. This is what “redact” should mean, and it is different from “draw a box.”

Cover, then rasterize. The other dependable approach is to place your covering marks and then convert the page into a flat image, so the page becomes pixels rather than text-plus-shapes. A rasterized page has no selectable characters and no separable box layer. What you see is genuinely all that is in the file. The tradeoff is that the whole page loses its real text, so it is no longer searchable or copyable anywhere, and the file is usually larger. For a single page of something that must never leak, that tradeoff is often worth it.

Both approaches share the same principle. The hidden text is not behind something. It is destroyed.

A decision you can actually use

Here is the rule I would give anyone handling sensitive documents.

If it would be fine for the covered text to surface later, you do not really need redaction, and a visual box is your own business. If the covered text must never be recoverable, a visual box is not enough and never will be. In that case you have two acceptable outcomes: use real redaction that deletes the underlying content, or cover the area and then rasterize or flatten the page so the page carries no recoverable text. Anything that leaves the original characters in the file is not redaction. It is a costume.

One more habit worth keeping. After you redact, test your own work before you send it. Open the finished file, try to select across the area you hid, and paste it somewhere. Run a search for the exact name or number you meant to remove. If nothing comes out and nothing is found, the redaction held. If the text appears, you just caught a leak before it left your machine, which is exactly where you want to catch it.

Doing this without uploading the document

There is a quiet wrinkle in all of this. The documents that most need redacting, contracts, medical records, legal filings, identity documents, are the same documents you least want to upload to some stranger’s server just to black out a line. Sending a confidential file to an online tool to make it more confidential is its own kind of leak.

That is why redaction belongs in tools that run entirely in your browser, where the file never leaves your device. Local PDF tools for redacting, flattening, and rasterizing pages are coming soon to pdf.hivly.net, built so the work happens on your machine, with no upload and no account. The point is simple. If you are removing secrets from a file, the act of removing them should not hand the file to anyone else.

Until then, remember the one thing that matters. A black box is decoration. If the text must be gone, make it gone.

Try the pdf toolsMerge, split, compress, protect, unlock, sign and convert PDFs to and from images.

Frequently asked questions

Does a black box over text in a PDF actually delete the text?
No. A black box is usually an annotation or a shape drawn on top of the page. The original characters stay in the content stream underneath, so they can still be selected, copied, or revealed by deleting the box.
How can someone read text under a redaction box?
They can drag-select across the page and paste the hidden words into a text editor, run text extraction on the file, or open the PDF in an editor and delete the box. None of this requires special hacking tools.
What does true redaction actually do?
True redaction removes the underlying text and objects from the content stream, not just the visual layer. After that the words no longer exist in the file, so there is nothing to copy or recover.
Is flattening a PDF the same as redacting it?
Flattening helps because it merges layers and can rasterize a page into an image, which drops the selectable text. If you cover the text first and then flatten or rasterize, the hidden words are gone from that page.

Keep reading

Building something bigger?

Hivly is made by CodingEagles, a software studio that ships production web apps. If you have a real project, get in touch.

See what CodingEagles does →