This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Unicode equivalence" – news · newspapers · books · scholar · JSTOR(November 2014) (Learn how and when to remove this message)
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.
Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006EnLATIN SMALL LETTER N followed by U+0303◌̃COMBINING TILDE is defined by Unicode to be canonically equivalent to the single code point U+00F1ñLATIN SMALL LETTER N WITH TILDE of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. Similarly, each Hangul syllable block that is encoded as a single character may be equivalently encoded as a combination of a leading conjoining jamo, a vowel conjoining jamo, and, if appropriate, a trailing conjoining jamo.
Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the typographic ligature "ff") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as sorting and indexing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true.
The standard also defines a text normalization procedure, called Unicode normalization, that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normalization form or normal form of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one fully composed (where multiple code points are replaced by single points whenever possible), and one fully decomposed (where single points are split into multiple ones).
and 25 Related for: Unicode equivalence information
Unicodeequivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same...
uncommon Unicode characters. Without proper rendering support, you may see question marks, boxes, or other symbols. Unicode, formally The Unicode Standard...
normalization (equivalence), or the Unicode version in use. For instance, UDF is limited to Unicode 2.0; macOS's HFS+ file system applies NFD Unicode normalization...
not. Automated paraphrasing Canonicalization Text simplification Unicodeequivalence Richard Sproat and Steven Bedrick (September 2011). "CS506/606: Txt...
character orders Sorting Taxonomic sequence Mac and Mc together Unicodeequivalence Natural sort order Historically, computers only handled text in uppercase...
or logical equivalence), may be denoted by various symbols including =, ~, and ⇔. Additional precomposed symbols with code points in Unicode for notations...
precomposed Latin characters in Unicode Dead key Compose key Combining character Unicodeequivalence Complex text layout Unicode compatibility characters Alphabetic...
English. Security issues in Unicode Internationalized domain name Homoglyph Duplicate characters in UnicodeUnicodeequivalence Typosquatting U+043E о CYRILLIC...
descends from phi. Like other Greek letters, lowercase phi (encoded as the Unicode character U+03C6 φ GREEK SMALL LETTER PHI) is used as a mathematical or...
line rather than at the normal height for Unicode overlines and macrons: ħ. This is separately encoded in Unicode with the symbols using bar diacritics and...
Stackexchange. "Appendix 1: Shift_JIS-2004 vs Unicode mapping table", JIS X 0213:2004, X 0213. Shift-JIS to Unicode, Unicode. "Windows 932_81". Microsoft. Retrieved...
widespread use or acceptance. The "metre per second" symbol is encoded by Unicode at code point U+33A7 ㎧ SQUARE M OVER S. Orders of magnitude (speed) Metre...
considered. To deal with this, Unicode provides the mechanism of canonical equivalence. In this context, canonicalization is Unicode normalization. Variable-width...
not have this equivalence listed in this entry. Unicode demands that all entries, once admitted, cannot change compatibility or equivalence so that normalization...
≡, is a symbol with multiple, context-dependent meanings indicating equivalence of two different things. Its main uses are in mathematics and logic....
Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek...
in some modern languages and is present in Unicode as U+00B7 · MIDDLE DOT. The multiplication dot (Unicode U+22C5 ⋅ DOT OPERATOR) is frequently used in...
In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older...
purposes of compatibility with Chinese, Japanese and Korean (CJK) characters, Unicode has symbols for: centimetre – U+339D ㎝ SQUARE CM square centimetre – U+33A0...
regular-expression libraries. While PCRE originally aimed at feature-equivalence with Perl, the two implementations are not fully equivalent. During the...
for Unicode. In most respects it makes no difference what the character set is, but some issues do arise when extending regexes to support Unicode. Supported...