Basic Latin is the smallest and oldest block in the Unicode standard. It contains exactly 128 codepoints, from U+0000 to U+007F, and they are an exact, codepoint-for-codepoint copy of the 1967 ASCII standard. When the Unicode Consortium drafted its first version in 1991, the founding decision was that ASCII would be honored in place: every old plain-text file ever written, every C string, every UNIX configuration file, would remain valid Unicode without modification.

About this block

ASCII — the American Standard Code for Information Interchange — was finalized by the American Standards Association as ASA X3.4-1963, then revised in 1967 and again in 1986. It was a seven-bit code (128 values) designed for teletype hardware. The original committee, chaired by Bob Bemer and heavily influenced by IBM, arranged the printable characters so that capital and lowercase letters differed by a single bit (0x20), so that decimal digits encoded their numeric value in the low four bits (0x30 + digit), and so that the first 32 positions were reserved for non-printing control codes inherited from teleprinter conventions.

Those control characters — U+0000 through U+001F plus U+007F DEL — are still present in Basic Latin today. Some remain in everyday use: TAB (U+0009), LF (U+000A), and CR (U+000D) line and tab control your terminals and source files; BEL (U+0007) still rings the terminal bell; ESC (U+001B) introduces ANSI escape sequences. Others are vestigial. ACK, ENQ, SYN, ETB, DC1–DC4 belonged to a world of synchronous tape drives and start/stop teleprinters, and they survive in Unicode only because removing them would break the round-trip guarantee with legacy data.

The seven-bit boundary matters even now. UTF-8 was specifically designed so that any byte with the high bit clear is a complete, standalone ASCII character — never a continuation byte, never a leading byte of a multi-byte sequence. This means UTF-8 text containing only ASCII is ASCII text: the two encodings are byte-identical for the lower 128 codepoints. Email headers, HTTP request lines, JSON keys, programming language keywords, and machine-readable protocols of every kind exploit this fact daily. It is why UTF-8 won the web — adopting it cost nothing for the English-speaking world.

The Basic Latin block is maintained by the Unicode Consortium but is effectively frozen. Its semantics are fixed by treaty with every other character-set standard. Properties are clarified periodically in the Unicode Character Database, but no codepoint will ever be added, removed, or renamed inside this range.