What WCAG tools miss (the accessibility issues no scanner will ever flag)

WCAG is a remarkable document. It's the closest the web has to a written accessibility standard, and tooling that checks against it (axe, WAVE, Lighthouse, Pa11y, SiteImprove and friends) does genuinely useful work. We're not here to tell you to stop using them.

We are here to point out that the most damaging accessibility failures we surface in real-user testing are issues no scanner has ever flagged or ever will flag. They're structural. Automated tools work on the DOM and CSS. The issues below live in the gap between what the code says and what humans actually experience.

If your accessibility programme begins and ends with WCAG audits, you are systematically missing the failures that drive customers away.

1. Character confusion (l vs I vs 1, O vs 0)

This one is a particularly sharp illustration of the gap.

In Times New Roman, Helvetica, and Arial (three of the most-used fonts on the web), the lowercase l, uppercase I, and digit 1 are nearly identical at standard reading sizes. So are uppercase O and digit 0. Under perfect viewing conditions, sighted users disambiguate from context. Under any kind of stress (magnification, glare, cataracts, fatigue, dyslexia), they don't.

Where does this matter? Passwords. Account numbers. Verification codes. URLs. Reference numbers. Medical record IDs. Banking transaction descriptions. Anywhere a single character carries meaning and a single wrong character means the user re-types or gets locked out.

No automated tool will flag this. The characters are technically distinct in the font file. They pass every contrast check. They render identically across browsers. From the scanner's perspective, the page is fine.

From the user's perspective, the page is the reason they call your contact centre instead.

Atkinson Hyperlegible, the font commissioned by the Braille Institute precisely for this, bakes in deliberate character distinction. We covered this in detail in For the typography nerds (opens in a new tab). The relevant point here is that font choice is an accessibility decision, and no WCAG tool weighs in on it.

2. Dark mode: the most-sought-after adaptation that WCAG doesn't mention

Here's a statistic worth sitting with: across hundreds of See Me Please testing projects, dark mode is the single most-requested accessibility adaptation we hear from participants. It's not close. It outranks read-aloud widgets, font-size toggles, accessibility statement links, and most of the other adaptations teams routinely budget for.

It's also not anywhere in WCAG.

The Web Content Accessibility Guidelines recommend contrast minimums (1.4.3 at AA, 1.4.6 at AAA). They say nothing about whether a user can choose a dark theme. A product can be fully WCAG 2.2 AAA conformant and have no dark mode at all. By the scanner's reading, it's perfect. By the participant's lived experience, it's the reason their eyes hurt within ten minutes.

We estimate around 75% of our low-vision testers and at least half of our blind testers express a strong preference or genuine need for dark mode. The reasons are concrete and physical:

Glare reduction. Bright white backgrounds cause real pain for low-vision users with light sensitivity, and many blind users retain enough light perception for glare to be debilitating.
Halation prevention. White-on-black or light-text-on-light backgrounds bleed visually for users with degenerative eye conditions, making letterforms blur into each other.
Sustained engagement. Many participants describe setting their entire device to dark mode permanently to extend the time they can comfortably engage with screens before fatigue forces them to stop.

This is the cleanest example of the WCAG-versus-real-user gap. The most-asked-for feature among the population accessibility tools are meant to serve isn't in the rules accessibility tools check. We covered the research in depth at Dark mode, essential not a preference (opens in a new tab), with verbatim quotes from low-vision testers describing what bright screens actually feel like to them.

If your product doesn't support dark mode, your scanner won't tell you. Your customers eventually will, by leaving.

3. Cognitive load in authentication

WCAG 3.3.8 (Accessible Authentication) says authentication "should not rely on cognitive function tests." Most teams interpret this narrowly: don't require the user to memorise a password. OK, fine, we allow password managers and we're done.

The deeper issue is that real authentication flows are stacked with cognitive load that no scanner sees:

Real-time password validation widgets that announce every keystroke via aria-live, turning a 12-character password into a 240-token wall of speech for screen reader users
One-time codes buried in dense email paragraphs: readable in 8pt, but invisible to a user navigating with magnification or screen reader heading-jump
Missing show/hide toggles on password fields: forcing users with motor or vision constraints to type complex strings without visual confirmation
Aggressive session timeouts that punish users for taking the time their assistive technology requires
Captchas with no audio alternative: or with audio so distorted that Deaf-Blind users can't get past them

Every one of these issues passes every automated audit. Every one of these is in our top-five abandonment reasons for actual users. We unpacked the patterns at Authentication... when logging in becomes the lockout (opens in a new tab).

4. Plain language and comprehension

WCAG 3.1.5 (Reading Level) is AAA, the highest conformance tier. Almost no team aims for AAA. Yet the most common abandonment cause we see in financial-services and government testing is not a WCAG violation; it's a user being unable to understand a sentence written in legal-policy dialect.

A product disclosure statement that reads at a year-12 reading level technically passes WCAG 2.2 AA. It systematically excludes:

ESL users (about 20% of the Australian adult population)
Users with cognitive disabilities
Users with low literacy
Users in distress (financial stress, medical anxiety, recent bereavement) where reading capacity drops temporarily

A scanner can flag heading order. It cannot tell you that "indemnity exclusion applies to consequential loss arising from acts of insured parties" is going to lose a third of your customers at the comprehension step.

5. The "edge case" failure pattern

We've worked with enough product teams to recognise the pattern. It looks like this: an audit passes, the team ships, and within weeks a small number of users complain bitterly about issues nobody can reproduce on the demo laptop. The standard response is to label them edge cases, assign a P3 ticket, and move on.

The users in those reports are routinely:

Older users on legacy devices and older browsers: your prefers-reduced-motion code path nobody ever tested
Low-vision users at 400% zoom: where every fixed-position element you didn't think about overlaps the content
Switch users: where the focus-order issues your scanner ignored become impassable
Deaf and Hard of Hearing users on video content, where the auto-captions you trusted are 70% accurate, which is failing on every third sentence
DeafBlind users: for whom every single assistive-technology assumption your team holds is wrong

The pattern of complaint is identifiable, repeated across our projects, and structurally invisible to automated tooling. The complainants are not edge cases. They're the test set you didn't run.

6. The friction of "technically reachable"

This is the deepest gap. WCAG 2.4.6 (Headings and Labels) says headings and labels must describe topic or purpose. A scanner can confirm a heading exists. It cannot tell you the heading is on page seventeen of an FAQ when the user needed an answer in the first 30 seconds.

Reachability passes the audit. Findability doesn't. Task completion doesn't. Comprehension at the point of decision doesn't. These are usability dimensions, not WCAG dimensions, and they decide whether your customers convert or churn.

The honest summary: WCAG audits are not accessibility user testing

WCAG tools are necessary. They catch the syntactic issues that should never have shipped. They are not sufficient. Roughly half of WCAG violations are programmatically detectable; the other half need a human. And the issues above, character confusion, dark mode, authentication cognitive load, plain language, "edge case" cohorts, findability, sit outside WCAG entirely.

If your accessibility programme assumes the scanner is the test, it's not testing accessibility. It's testing the absence of one specific category of defect.

The fix isn't to abandon automated tooling. The fix is to recognise it's a build-pipeline guardrail, not an outcome measurement. The outcome measurement is whether real disabled users complete real tasks on your product. That requires watching them do it.

See Me Please is a diverse and disabled user testing platform connecting organisations with diverse and disabled participants to evaluate real-world usability beyond WCAG compliance.

What WCAG tools miss (the accessibility issues no scanner will ever flag)

1. Character confusion (l vs I vs 1, O vs 0)

2. Dark mode: the most-sought-after adaptation that WCAG doesn't mention

3. Cognitive load in authentication

4. Plain language and comprehension

5. The "edge case" failure pattern

6. The friction of "technically reachable"

The honest summary: WCAG audits are not accessibility user testing

See Me Please vs Fable: what's the difference?

How to test with blind users (what actually works)

Automated vs real-user accessibility testing: which do you need?

What WCAG tools miss (the accessibility issues no scanner will ever flag)

1. Character confusion (l vs I vs 1, O vs 0)

2. Dark mode: the most-sought-after adaptation that WCAG doesn't mention

3. Cognitive load in authentication

4. Plain language and comprehension

5. The "edge case" failure pattern

6. The friction of "technically reachable"

The honest summary: WCAG audits are not accessibility user testing

Related articles

See Me Please vs Fable: what's the difference?

How to test with blind users (what actually works)

Automated vs real-user accessibility testing: which do you need?