Unicode Converter

Convert between text and Unicode/HTML entities

Input

Note

Unicode is an international standard for representing characters worldwide
HTML entities are used to display special characters on web pages
Emoji may consist of multiple code points

Use Cases

Web Development

Convert to HTML entities for special characters

Programming

String encoding and escape handling

Data Analysis

Solve text data encoding issues

Debugging

Identify and analyze invisible characters

Frequently Asked Questions

What is Unicode?

Unicode is an international standard for representing all the world's characters in a unified system. It assigns unique code points (U+XXXX) to over 1.4 million characters including Korean, English, Chinese, Arabic, and emoji. As of Unicode 15.0, 149,813 characters are defined.

What is the difference between UTF-8 and UTF-16?

UTF-8 is variable-length encoding: ASCII characters use 1 byte, Korean characters use 3 bytes. It is the most widely used encoding on the web and Linux, and is backward-compatible with ASCII. UTF-16 represents most characters in 2 bytes and is used internally by Windows, Java, and JavaScript strings.

How are emoji represented in Unicode?

Emoji are located in the Supplementary Multilingual Plane of Unicode and use code points of U+1F600 and above. In UTF-16 they are represented as two surrogate pairs, and in UTF-8 they are encoded as 4 bytes. Some emoji are sequences of multiple emoji joined by ZWJ (Zero Width Joiner, U+200D).

What is the Unicode range for Korean characters?

Precomposed Hangul syllables are assigned from U+AC00 (가) to U+D7A3 (힣), covering 11,172 characters. Hangul Jamo (individual consonants and vowels) are located in the U+1100–U+11FF range. Korean is composed of initial, medial, and final elements, all of which are supported by Unicode.

History of Unicode Standards and Character Encoding

Character Encoding Before Unicode

Before Unicode, each country used its own character encoding system. English-speaking regions used ASCII (128 characters), Korea used EUC-KR, Japan used Shift-JIS, and China used GB2312 — hundreds of encodings coexisted. Data exchange between different encoding systems frequently caused garbled text (mojibake).

The Unicode Consortium and Standardization

In 1987, engineers from Xerox and Apple started the Unicode project, and Unicode 1.0 was released in 1991. Today, major tech companies including Apple, Google, Microsoft, and IBM participate in the Unicode Consortium. Unicode is synchronized with ISO/IEC 10646 and has been adopted as an international standard (ISO).

Character Encoding on the Modern Web

HTML5 recommends UTF-8 as the default encoding, and most modern websites use UTF-8. JavaScript strings are stored internally as UTF-16, and the encodeURIComponent() function performs URL encoding based on UTF-8. Databases are also trending toward Unicode encodings like utf8mb4 (MySQL) as the default setting.

Link copied!