Codepoint converter
Type any character, U+XXXX notation, decimal number, or hexadecimal value. Every other format appears below.
Type any character, U+XXXX notation, decimal number, or hexadecimal value. Every other format appears below.
The input field accepts any of seven different ways of referring to a Unicode codepoint. The tool tries each parser in order and uses whichever matches first. Once it has a single codepoint, it computes every common output format and renders them as a grid.
The accepted input forms are:
codePointAt(0) handles the surrogate pair correctly.The output formats include the canonical Unicode notation, decimal value, hex, padded binary, the UTF-8 byte sequence (computed with TextEncoder, which always emits UTF-8), the UTF-16 code unit sequence (one unit for codepoints in the Basic Multilingual Plane, two units forming a surrogate pair for any codepoint at or above U+10000), HTML decimal and hex numeric entities, the CSS string escape (a backslash followed by the hex codepoint and a trailing space), the JavaScript escape (using \u{...} for supplementary planes), the Python escape (\xHH, \uHHHH, or \U00HHHHHH depending on width), and the percent-encoded form for URLs.
Take U+1F600 GRINNING FACE (😀). It sits in the Supplementary Multiplane Plane, so its codepoint value is 128512. In UTF-8 it occupies four bytes: F0 9F 98 80. In UTF-16 it requires a surrogate pair, with high surrogate D83D and low surrogate DE00. JavaScript's "😀".length returns 2 because of this — a frequent source of bugs when slicing strings. Use [..."😀"].length or Array.from("😀").length to count codepoints. To embed it in CSS content, write \1F600; in JavaScript, "\u{1F600}"; in Python 3, "\U0001F600". As an HTML entity it is 😀 or 😀 — there is no named entity for emoji. Percent-encoded for URLs it becomes %F0%9F%98%80, since URL encoding operates on UTF-8 bytes, not codepoints.
For BMP characters the picture is simpler. U+00E9 LATIN SMALL LETTER E WITH ACUTE (é) has decimal value 233, occupies two UTF-8 bytes (C3 A9), is a single UTF-16 code unit, has the named HTML entity é, and percent-encodes to %C3%A9. Note the symmetry: the UTF-8 bytes and the percent-encoded form are the same bytes, just written differently.
The converter rejects values above 0x10FFFF (the upper limit of the Unicode codespace) and the surrogate range U+D800 through U+DFFF, which are reserved for UTF-16 internals and are not legal Unicode characters on their own. Lone surrogates cannot be encoded in UTF-8 — TextEncoder will refuse them. For codepoints below U+20, the output character may be invisible or render as a control picture; the meta fields still resolve. ASCII characters under 0x80 percent-encode to themselves (A → %41 only if forced; encodeURIComponent leaves it as A).