TOOL · CONVERSION

Codepoint converter

Type any character, U+XXXX notation, decimal number, or hexadecimal value. Every other format appears below.

How it works

The input field accepts any of seven different ways of referring to a Unicode codepoint. The tool tries each parser in order and uses whichever matches first. Once it has a single codepoint, it computes every common output format and renders them as a grid.

The accepted input forms are:

A single character — anything visible, including supplementary-plane characters that take two UTF-16 code units in JavaScript. codePointAt(0) handles the surrogate pair correctly.
U+XXXX — the canonical Unicode notation. Case-insensitive. Between four and six hex digits.
0xXXXX — C-style hex literal.
\uXXXX or \u{XXXX} — JavaScript and Python escape syntax. The braced form is required for codepoints above U+FFFF in modern JavaScript.
&#NNNN; — HTML decimal numeric character reference.
&#xHHHH; — HTML hexadecimal numeric character reference.
A bare decimal integer between 0 and 1,114,111 (the maximum legal codepoint, U+10FFFF).

The output formats include the canonical Unicode notation, decimal value, hex, padded binary, the UTF-8 byte sequence (computed with TextEncoder, which always emits UTF-8), the UTF-16 code unit sequence (one unit for codepoints in the Basic Multilingual Plane, two units forming a surrogate pair for any codepoint at or above U+10000), HTML decimal and hex numeric entities, the CSS string escape (a backslash followed by the hex codepoint and a trailing space), the JavaScript escape (using \u{...} for supplementary planes), the Python escape (\xHH, \uHHHH, or \U00HHHHHH depending on width), and the percent-encoded form for URLs.

Worked example

Take U+1F600 GRINNING FACE (😀). It sits in the Supplementary Multiplane Plane, so its codepoint value is 128512. In UTF-8 it occupies four bytes: F0 9F 98 80. In UTF-16 it requires a surrogate pair, with high surrogate D83D and low surrogate DE00. JavaScript's "😀".length returns 2 because of this — a frequent source of bugs when slicing strings. Use [..."😀"].length or Array.from("😀").length to count codepoints. To embed it in CSS content, write \1F600; in JavaScript, "\u{1F600}"; in Python 3, "\U0001F600". As an HTML entity it is 😀 or 😀 — there is no named entity for emoji. Percent-encoded for URLs it becomes %F0%9F%98%80, since URL encoding operates on UTF-8 bytes, not codepoints.

For BMP characters the picture is simpler. U+00E9 LATIN SMALL LETTER E WITH ACUTE (é) has decimal value 233, occupies two UTF-8 bytes (C3 A9), is a single UTF-16 code unit, has the named HTML entity é, and percent-encodes to %C3%A9. Note the symmetry: the UTF-8 bytes and the percent-encoded form are the same bytes, just written differently.

Edge cases

The converter rejects values above 0x10FFFF (the upper limit of the Unicode codespace) and the surrogate range U+D800 through U+DFFF, which are reserved for UTF-16 internals and are not legal Unicode characters on their own. Lone surrogates cannot be encoded in UTF-8 — TextEncoder will refuse them. For codepoints below U+20, the output character may be invisible or render as a control picture; the meta fields still resolve. ASCII characters under 0x80 percent-encode to themselves (A → %41 only if forced; encodeURIComponent leaves it as A).

Related

UTF-8 encoder — bytes for an entire string, not just one character
Character inspector — codepoint-by-codepoint breakdown of a string
HTML entity encoder — named entity lookup
URL encoder — full-string percent encoding
What is Unicode?
UTF-8, UTF-16, UTF-32 compared
Codepoint, character, glyph, grapheme
HTML entities and escapes
Basic Latin block