Hebrew (U+0590–U+05FF) — UnicodeCharacter

The Hebrew block, U+0590 through U+05FF, holds the entire working machinery of one of the oldest continuously used writing systems on the planet. Hebrew is an abjad: a script in which letters represent consonants, and vowels are either inferred from context or supplied by separate marks placed above, below, or inside the consonants. Reading direction is right-to-left, which makes Hebrew the original headache that Unicode's bidirectional algorithm — defined in UAX #9 — was designed to solve.

About this block

Hebrew has been in the standard since Unicode 1.0, released in October 1991. The original allocation roughly mirrored the legacy ISO 8859-8 code page for modern Israeli Hebrew, then extended it with the full set of niqqud vowel points and the much larger set of cantillation marks used in the Tanakh. The block is 256 codepoints wide; most positions are assigned, but a number of reserved slots remain — Unicode is deliberately conservative about filling Hebrew because the script has accumulated more than two millennia of scholarly convention and any addition risks contradicting an existing typesetting tradition.

The twenty-two consonants live at U+05D0 through U+05EA: א alef, ב bet, ג gimel, ד dalet, ה he, ו vav, ז zayin, ח het, ט tet, י yod, כ kaf, ל lamed, מ mem, נ nun, ס samekh, ע ayin, פ pe, צ tsadi, ק qof, ר resh, ש shin, and ת tav. Five of these letters take a distinct "final form" — called sofit — when they appear at the end of a word: ך final kaf (U+05DA), ם final mem (U+05DD), ן final nun (U+05DF), ף final pe (U+05E3), and ץ final tsadi (U+05E5). Crucially, these final forms are given separate codepoints rather than treated as contextual glyph variants. The decision keeps Hebrew text reversibly convertible to and from legacy encodings, but it pushes the burden of positional correctness onto authors and input methods: a misplaced kaf-versus-final-kaf is a spelling error, not a rendering one.

Below and inside the consonants sit the niqqud, the vowel-pointing system codified by the Tiberian Masoretes between roughly the 7th and 10th centuries CE. Niqqud occupy U+05B0–U+05BC and a few neighbouring codepoints — sheva, hataf segol, hataf patah, hataf qamats, hiriq, tsere, segol, patah, qamats, holam, qubuts, and the central dot dagesh. They are combining marks (Unicode general category Mn), which means they attach to the preceding consonant and do not advance the cursor. Modern Israeli newspapers and street signs omit niqqud entirely; liturgical texts, dictionaries, children's primers, and poetry use them religiously. Above the consonants — and sometimes below — sit the te'amim, the cantillation marks used to chant the Torah. The block reserves an enormous range, U+0591 through U+05AF, for these marks: etnahta, segol-ta'am, shalshelet, zaqef qatan, revia, pashta, yetiv, tevir, geresh, gershayim, qarne para, telisha gedola, telisha qetana, pazer, and dozens more. Each mark cues a specific musical motif in the synagogue chant, and the system's complexity — both visually and semantically — is why the Tanakh remains one of the most difficult things in the world to typeset.

Hebrew punctuation is small but distinct. ׃ U+05C3 SOF PASUQ ends a biblical verse (it looks like a colon but is semantically different). ׳ U+05F3 GERESH and ״ U+05F4 GERSHAYIM resemble an apostrophe and a quotation mark but are dedicated codepoints for marking abbreviations, acronyms, and gematria numerals. The block also encodes the precomposed Yiddish digraphs װ U+05F0 double vav, ױ U+05F1 vav-yod, and ײ U+05F2 double yod — these existed in the old Yiddish codepages and were preserved verbatim. Three precomposed letter-plus-dagesh combinations and the shin/sin dot variants (שׁ shin with shin dot, שׂ shin with sin dot) round out the encoding. For typographically richer presentation forms — alef with mappiq, bet with dagesh, lamed with holam, and so on — see the Hebrew Presentation Forms block at U+FB1D–U+FB4F, kept around for compatibility with legacy fonts. Because Hebrew is Bidi class R (right-to-left strong), digits embedded in Hebrew prose remain LTR per the bidirectional algorithm, which is why a phone number written in a Hebrew sentence still reads "left to right" inside an otherwise right-flowing line. The tetragrammaton, conventionally typed as יהוה, and the prefix א that opens the Shema, both render correctly only when the renderer respects this block's combining-mark and reading-direction semantics.

Hebrew

About this block

Notable characters

Every character in the block

About this block

Notable characters

Every character in the block

Related blocks