CATEGORY · N · NUMBERS

Numbers

Decimal digits, Roman numerals, fractions, circled digits — every codepoint with a numeric meaning.

The Number group collects every codepoint that Unicode considers numeric in some way. About 1,800 codepoints carry an N category in Unicode 16.0. They are split into three subcategories that look superficially similar but behave very differently in parsing and locale-aware text processing.

The subcategories

Nd
Number, decimal digit — codepoints belonging to a script's 0–9 cycle, with an integer Numeric_Value from 0 to 9. There are over 700 Nd codepoints because dozens of scripts have their own digit set. Examples: 0–9 (ASCII), ٠١٢٣٤٥٦٧٨٩ (Arabic-Indic), ۰۱۲۳۴۵۶۷۸۹ (Extended Arabic-Indic), ०१२३४५६७८९ (Devanagari), ০১২৩৪৫৬৭৮৯ (Bengali), ๐๑๒๓๔๕๖๗๘๙ (Thai), 𝟎𝟏𝟐𝟑 (Mathematical Bold).
Nl
Number, letter — numerals whose codepoints are letter-shaped. Roman numerals occupy U+2160–U+2188 (Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅼ Ⅽ Ⅾ Ⅿ). The Greek acrophonic numerals, Aegean numbers, Etruscan numerals, Counting Rod numerals and Cuneiform numbers are also Nl. They are useful when you want to display Roman numerals as text rather than asking the font to substitute small caps.
No
Number, other — everything else. Vulgar fractions like ½ ¼ ¾ ⅓ ⅔ ⅛, superscripts ¹ ² ³, subscripts ₁ ₂ ₃, circled and parenthesised digits ① ② ⑴ ⑵, the Hangzhou suzhou numerals, and dozens of historical fractional pieces. Many No codepoints carry a fractional Numeric_Value (½ has the value 0.5).

Decimal digit scripts

Software that lets users enter numeric input in their own script must understand the Nd subcategory. ICU's UNumberFormat, Python's str.isdigit() and JavaScript's \d in Unicode mode all consult this list. ASCII 09 are U+0030 through U+0039 with Numeric_Value 0–9. The Arabic-Indic digits at U+0660–U+0669 carry the same values, which is why text rendered ٢٠٢٦ can be read off as "2026" by any locale-aware parser. Beware: the legacy \d in Python 2 byte-string regexes and in PCRE's default mode only matches ASCII digits. Unicode-mode \d in modern regex flavours matches all 700+ Nd codepoints, which is why you should usually anchor numeric parsing to [0-9] explicitly when you mean ASCII.

Roman numerals and Nl pitfalls

The Roman numeral codepoints look identical to the corresponding Latin capitals — (U+2169) versus X (U+0058) — but they are different characters with different categories. Mixing the two in a single document confuses search and sort. The Unicode standard recommends using ordinary Latin letters for Roman numerals in running text and reserving U+2160–U+2188 for cases where a single-character glyph (for typographic spacing, vertical text in CJK, or screen-reader cues) is wanted.

Fractions and superscripts

The vulgar fractions in Latin-1 Supplement (¼ ½ ¾) and the additional fractions at U+2150–U+215F are convenient typographically but represent only a small subset of fractions. Modern OpenType fonts can render arbitrary fractions through the frac feature, which is almost always a better approach for body text. Superscript and subscript digits at U+00B2, U+00B3, U+00B9, U+2070–U+209F should similarly be reserved for plain-text contexts (CSV, JSON, log lines) where you cannot rely on font features.

Numeric value, not just category

Each numeric codepoint also has up to three numeric-value properties in UnicodeData.txt: Numeric_Type (Decimal, Digit, Numeric, None), Numeric_Value, and the legacy Decimal_Digit_Value. The first is the most general. For ½, Numeric_Type is Numeric and Numeric_Value is 0.5. For Ⅹ, the type is Numeric and the value is 10. For ASCII 3, the type is Decimal and the value is 3. The character inspector displays these for any input.

Example characters

U+0030 · Nd0Digit Zero U+0039 · Nd9Digit Nine U+0660 · Nd٠Arabic-Indic Zero U+06F0 · Nd۰Extended Arabic-Indic Zero U+0966 · NdDevanagari Zero U+09E6 · NdBengali Zero U+0E50 · NdThai Zero U+2160 · NlRoman Numeral One U+2161 · NlRoman Numeral Two U+2162 · NlRoman Numeral Three U+2169 · NlRoman Numeral Ten U+00BC · No¼Vulgar One Quarter U+00BD · No½Vulgar One Half U+00BE · No¾Vulgar Three Quarters U+00B3 · No³Superscript Three U+2460 · NoCircled Digit One

Related