Most of Unicode is a registry of agreed meanings: U+0041 will be the Latin capital A forever, U+1F600 will be the grinning emoji forever, and no future version will quietly reassign them. The Private Use Areas are the exception. They are three large blocks of codepoints that the standard explicitly leaves unassigned, with a guarantee that they will remain unassigned. Anyone — a font designer, a constructed-language enthusiast, an icon-set vendor, a medievalist transcription project — is free to put their own characters there. The cost is that nobody else will recognise them.
The three ranges
| Range | Name | Plane | Codepoints |
|---|---|---|---|
| U+E000 – U+F8FF | Private Use Area | 0 (BMP) | 6,400 |
| U+F0000 – U+FFFFD | Supplementary Private Use Area-A | 15 | 65,534 |
| U+100000 – U+10FFFD | Supplementary Private Use Area-B | 16 | 65,534 |
Together that is 137,468 codepoints. The BMP block was reserved in Unicode 1.0; the two supplementary planes were added when supplementary planes were introduced in Unicode 2.0 (1996). The character properties for everything in these blocks are uniform: general category Co (other, private use), no name, no decomposition, no bidirectional behaviour beyond a neutral default. They render only if a font in the active fallback chain has a glyph for the requested codepoint.
What lives in U+E000–U+F8FF
The BMP PUA is the most-used region because BMP codepoints fit in a single UTF-16 unit and survive systems that handle supplementary characters badly. A short list of established conventions:
- U+F8FF
- The Apple logo. Apple has used this codepoint for the glyph in its system fonts since Mac OS 8.5. On any non-Apple device the codepoint renders as the missing-glyph box or as whatever the local font happens to put there.
- U+F000–U+F8FF
- The Material Symbols and Material Icons fonts from Google use a private-use range to expose their icons as text. Each icon (home, search, settings, etc.) is a single PUA codepoint that the font draws as the corresponding pictogram.
- U+E000–U+E0FF (variable)
- Font Awesome, Phosphor, Bootstrap Icons, and most CSS-icon libraries use the lower part of the PUA. The exact codepoint for, say, the gear icon is library-specific and changes between versions.
- U+E000–U+EFFF (allocated)
- The ConScript Unicode Registry (CSUR), a community project run by Michael Everson and others, assigns PUA ranges to constructed and reconstructed scripts: Tengwar (Tolkien's Elvish), Cirth (Tolkien's runic), Klingon pIqaD, Aurebesh (Star Wars), Shavian extensions, and dozens more. The CSUR is not part of the Unicode standard, but its codepoints are widely respected by fonts and tools in that community.
- U+E000–U+E8FF (MUFI)
- The Medieval Unicode Font Initiative assigns codepoints to characters used in medieval Latin manuscripts but not yet in Unicode itself: scribal abbreviations, special combining forms, and ligatures. MUFI 4.0 covers around 1,500 characters.
What lives in the supplementary planes
The supplementary PUA blocks (Plane 15 and Plane 16) are too large for casual use and are mostly reserved by organisations that need a generous private space. Some of the conventions there:
- Plane 15 (U+F0000–U+FFFFD) is reserved by some operating systems for private metadata characters and by the CSUR for scripts that do not fit in the BMP PUA (high-range Tengwar, etc.).
- Plane 16 (U+100000–U+10FFFD) is largely unused. Some font foundries reserve sections of it for proprietary glyph IDs; some scientific notation projects use it for symbols they do not want to submit to the Unicode Technical Committee.
- The four "noncharacters" at the very end of each plane (U+xFFFE, U+xFFFF, and the small block U+FDD0–U+FDEF) are not PUA — they are permanently unassigned and meant to never appear in interchange. Programs may use them as sentinels.
Tolkien's Elvish
Tengwar — the script Tolkien designed for the Elves of Middle-earth, used in the title pages of The Lord of the Rings and on the One Ring's inscription — has been assigned the ConScript range U+E000 through U+E07F since the 1990s. The CSUR allocations include:
U+E000 TENGWAR LETTER TINCO
U+E001 TENGWAR LETTER PARMA
U+E002 TENGWAR LETTER CALMA
U+E003 TENGWAR LETTER QUESSE
U+E020 TENGWAR LETTER ROOMEN
U+E024 TENGWAR LETTER ESSE
U+E030 TENGWAR VOWEL SIGN THREE DOTS (combining)
...
Fonts that ship with these glyphs — Annatar, Tengwar Sindarin, several free variants — let you type Tolkien's script in any application that handles Unicode text. A document written in CSUR Tengwar on one machine and opened on a machine without a Tengwar font shows as a row of missing-glyph boxes. The codepoints are valid; they simply have no agreed meaning outside the community.
Klingon pIqaD has been assigned U+F8D0 through U+F8FF since 1997, with letters at U+F8D0 (a), U+F8D1 (b), and so on through U+F8E9 (y). The Klingon Language Institute and CBS/Paramount have both used these codepoints in published material. A formal Unicode proposal for Klingon was rejected in 2001 on the grounds that nobody used pIqaD in everyday writing; the PUA assignment remains the de facto standard.
The risks
PUA codepoints are deliberately invisible to most of the Unicode infrastructure. That has consequences:
- No portability
- A document that uses PUA codepoints depends entirely on the receiving system having the right font. Email the same Tengwar text to two people and one will see the script and one will see boxes.
- No semantic meaning
- Search engines, screen readers, language-detection libraries, and accessibility tools cannot identify PUA codepoints. A screen reader reading a Material Icons gear icon will report "private use character U+E8B8" or skip it entirely. Always provide accessible text alternatives.
- No collation
- Sorting Tengwar text using a default Unicode collation algorithm produces meaningless results — the codepoints have no defined sort weights.
- Collisions
- Two icon fonts both using U+E000 for their first icon will render whichever the cascading stylesheet picks first. A document that mixes icon fonts can produce surprising glyphs.
- No security review
- PUA codepoints are not part of the IDNA 2008 allowed-character set; they are not in the PRECIS identifier profile. They cannot appear in domain names, in URLs without percent-encoding, or in many protocol identifier slots.
When to use the PUA
The Private Use Areas are appropriate when:
- You are building a closed system where you control both the font and the consuming software (an icon font inside your own web app, for example).
- You need a glyph that has not been added to Unicode yet, and you can ship the font alongside the content (academic transcriptions of historic manuscripts).
- You are using a constructed script community convention (CSUR for Tengwar, KLI for Klingon) and you accept the limited portability.
They are not appropriate when:
- The content needs to be searchable, indexable, or accessible.
- The content will be sent through systems you do not control.
- A real Unicode codepoint exists for the character. The MUFI assignments, for example, have been progressively absorbed into the official Latin Extended-D and Latin Extended-E blocks as the Unicode Technical Committee accepts them; use the official codepoints where they exist.
If you are tempted to use the PUA for your own font's icons, consider whether the icons could be SVG inside <svg> elements instead. SVG icons survive accessibility tools, never collide with another font's PUA assignments, and remain visible if the icon font fails to load.
What to remember
The Private Use Areas are a deliberate hole in the standard, useful in proportion to how much of the rendering pipeline you control. For closed systems and constructed-script communities they work. For interchange, they are a fallback that you will spend time apologising for. The 137,468 codepoints are there if you need them; most documents should not.
Further reading
- What is Unicode? — the standard that defines, and reserves, the PUA blocks.
- Basic Latin block — the opposite end of the spectrum: codepoints with universal agreement.
- Character inspector — paste any PUA glyph and confirm its codepoint.
- Codepoint converter — useful for picking PUA codepoints inside a chosen range.
- Codepoint, character, glyph, grapheme — the framing inside which PUA fits.
- Unicode normalization explained — PUA characters are never decomposed; what that means for search.