Unicode normalizer
Run any string through all four Unicode normalization forms at once. See which ones change the input, and by how much.
Run any string through all four Unicode normalization forms at once. See which ones change the input, and by how much.
The four normalization forms are defined by Unicode Standard Annex #15. They give every Unicode string a canonical representation so that visually-identical strings can be compared as equal. The tool runs the input through each form using the platform's String.prototype.normalize method, then compares each result to the input and reports the differences.
é (U+00E9) becomes e + U+0301 COMBINING ACUTE ACCENT.é stays as é.fi (U+FB01) becomes f + i. Superscripts and subscripts lose their formatting. Half-width and full-width forms collapse to their normal-width counterparts.fi and fi to compare equal but you don't want unnecessary decomposition.The two key principles to remember: canonical normalization preserves visual identity and round-trips losslessly; compatibility normalization additionally collapses formatting-only distinctions and can throw away information. Use NFC for storage, NFKC for matching.
Three situations almost always call for normalization:
café (precomposed) and café (decomposed) as distinct accounts.Don't normalize blindly. NFKC will turn the mathematical italic letter 𝑎 (U+1D44E) into a plain ASCII a — that's the right behaviour for identifier matching, but the wrong behaviour for math typesetting. Know which property you care about.
The input cafe + U+0301 — five — ① contains:
café (4 codepoints instead of 5). NFD and NFKD leave it as-is. NFKC composes it to café.f + i (2 codepoints instead of 1).1.That's a single string with three different normalization signatures. The byte-length display under each pane makes the size changes concrete: NFD-forms are usually longer (more codepoints), NFKC and NFKD discard formatting and can be either longer or shorter depending on the inputs.