Global Information Lookup Global Information

Unicode equivalence information


Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.

Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E n LATIN SMALL LETTER N followed by U+0303 ◌̃ COMBINING TILDE is defined by Unicode to be canonically equivalent to the single code point U+00F1 ñ LATIN SMALL LETTER N WITH TILDE of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. Similarly, each Hangul syllable block that is encoded as a single character may be equivalently encoded as a combination of a leading conjoining jamo, a vowel conjoining jamo, and, if appropriate, a trailing conjoining jamo.

Sequences that are defined as compatible are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the typographic ligature "ff") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as sorting and indexing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true.

The standard also defines a text normalization procedure, called Unicode normalization, that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normalization form or normal form of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one fully composed (where multiple code points are replaced by single points whenever possible), and one fully decomposed (where single points are split into multiple ones).

and 25 Related for: Unicode equivalence information

Request time (Page generated in 0.8321 seconds.)

Unicode equivalence

Last Update:

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same...

Word Count : 1902

Unicode

Last Update:

uncommon Unicode characters. Without proper rendering support, you may see question marks, boxes, or other symbols. Unicode, formally The Unicode Standard...

Word Count : 10732

Filename

Last Update:

normalization (equivalence), or the Unicode version in use. For instance, UDF is limited to Unicode 2.0; macOS's HFS+ file system applies NFD Unicode normalization...

Word Count : 3699

List of Unicode characters

Last Update:

(Unicode block) Khojki (Unicode block) Khudawadi (Unicode block) Lao (Unicode block) Lepcha (Unicode block) Limbu (Unicode block) Mahajani (Unicode block)...

Word Count : 1827

Text normalization

Last Update:

not. Automated paraphrasing Canonicalization Text simplification Unicode equivalence Richard Sproat and Steven Bedrick (September 2011). "CS506/606: Txt...

Word Count : 644

Collation

Last Update:

character orders Sorting Taxonomic sequence Mac and Mc together Unicode equivalence Natural sort order Historically, computers only handled text in uppercase...

Word Count : 2417

Equals sign

Last Update:

or logical equivalence), may be denoted by various symbols including =, ~, and ⇔. Additional precomposed symbols with code points in Unicode for notations...

Word Count : 2563

Precomposed character

Last Update:

precomposed Latin characters in Unicode Dead key Compose key Combining character Unicode equivalence Complex text layout Unicode compatibility characters Alphabetic...

Word Count : 669

Glossary of mathematical symbols

Last Update:

2100–214F: Unicode Letterlike Symbols Range 2190–21FF: Unicode Arrows Range 2200–22FF: Unicode Mathematical Operators Range 27C0–27EF: Unicode Miscellaneous...

Word Count : 9640

Duplicate characters in Unicode

Last Update:

composed from these. IDN homograph attack Unicode equivalence Homoglyph ASCII art "UTR #25: Unicode and Mathematics". unicode.org. Retrieved 2024-03-04....

Word Count : 1260

IDN homograph attack

Last Update:

English. Security issues in Unicode Internationalized domain name Homoglyph Duplicate characters in Unicode Unicode equivalence Typosquatting U+043E о CYRILLIC...

Word Count : 3779

Phi

Last Update:

descends from phi. Like other Greek letters, lowercase phi (encoded as the Unicode character U+03C6 φ GREEK SMALL LETTER PHI) is used as a mathematical or...

Word Count : 1480

Bracket

Last Update:

"Small Form Variants" (PDF). The Unicode Standard. Unicode Consortium. "Ogham Code Chart" (PDF). The Unicode Standard. Unicode Consortium. Archived (PDF) from...

Word Count : 5768

Overline

Last Update:

line rather than at the normal height for Unicode overlines and macrons: ħ. This is separately encoded in Unicode with the symbols using bar diacritics and...

Word Count : 2110

Tilde

Last Update:

Stackexchange. "Appendix 1: Shift_JIS-2004 vs Unicode mapping table", JIS X 0213:2004, X 0213. Shift-JIS to Unicode, Unicode. "Windows 932_81". Microsoft. Retrieved...

Word Count : 6894

Metre per second

Last Update:

widespread use or acceptance. The "metre per second" symbol is encoded by Unicode at code point U+33A7 ㎧ SQUARE M OVER S. Orders of magnitude (speed) Metre...

Word Count : 536

Canonicalization

Last Update:

considered. To deal with this, Unicode provides the mechanism of canonical equivalence. In this context, canonicalization is Unicode normalization. Variable-width...

Word Count : 1338

Han unification

Last Update:

not have this equivalence listed in this entry. Unicode demands that all entries, once admitted, cannot change compatibility or equivalence so that normalization...

Word Count : 6317

Triple bar

Last Update:

≡, is a symbol with multiple, context-dependent meanings indicating equivalence of two different things. Its main uses are in mathematics and logic....

Word Count : 1017

Greek and Coptic

Last Update:

Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek...

Word Count : 458

Interpunct

Last Update:

in some modern languages and is present in Unicode as U+00B7 · MIDDLE DOT. The multiplication dot (Unicode U+22C5 ⋅ DOT OPERATOR) is frequently used in...

Word Count : 3472

Unicode compatibility characters

Last Update:

In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older...

Word Count : 3325

Centimetre

Last Update:

purposes of compatibility with Chinese, Japanese and Korean (CJK) characters, Unicode has symbols for: centimetre – U+339D ㎝ SQUARE CM square centimetre – U+33A0...

Word Count : 424

Perl Compatible Regular Expressions

Last Update:

regular-expression libraries. While PCRE originally aimed at feature-equivalence with Perl, the two implementations are not fully equivalent. During the...

Word Count : 2561

Regular expression

Last Update:

for Unicode. In most respects it makes no difference what the character set is, but some issues do arise when extending regexes to support Unicode. Supported...

Word Count : 8915

PDF Search Engine © AllGlobal.net