Global Information Lookup Global Information

Universal Character Set characters information


The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set (abbr. UCS, official designation: ISO/IEC 10646), is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

UCS has a potential capacity of over 1 million characters. Each UCS character is abstractly represented by a code point, an integer between 0 and 1,114,111 (1,114,112 = 220 + 216 or 17 × 216 = 0x110000 code points), used to represent each character within the internal logic of text processing software. As of Unicode 15.1, released in September 2023, 293,792 (26%) of these code points are allocated, 149,878 (13%) have been assigned characters, 137,468 (12%) are reserved for private use, 2,048 are used to enable the mechanism of surrogates, and 66 are designated as noncharacters, leaving the remaining 820,320 (74%) unallocated. The number of encoded characters is made up as follows:

  • 149,641 graphical characters (some of which do not have a visible glyph, but are still counted as graphical)
  • 237 special purpose characters for control and formatting.

ISO maintains the basic mapping of characters from character name to code point. Often, the terms character and code point will be used interchangeably. However, when a distinction is made, a code point refers to the integer of the character: what one might think of as its address. Meanwhile, a character in ISO/IEC 10646 includes the combination of the code point and its name, Unicode adds many other useful properties to the character set, such as block, category, script, and directionality.

In addition to the UCS, the supplementary Unicode Standard, (not a joint project with ISO, but rather a publication of the Unicode Consortium,) provides other implementation details such as:

  1. mappings between UCS and other character sets
  2. different collations of characters and character strings for different languages
  3. an algorithm for laying out bidirectional text ("the BiDi algorithm"), where text on the same line may shift between left-to-right ("LTR") and right-to-left ("RTL")
  4. a case-folding algorithm

Computer software end users enter these characters into programs through various input methods, for example, physical keyboards or virtual character palettes.

The UCS can be divided in various ways, such as by plane, block, character category, or character property.[1]

  1. ^ "The Unicode Standard". The Unicode Consortium. Retrieved 2016-08-09.

and 20 Related for: Universal Character Set characters information

Request time (Page generated in 0.8845 seconds.)

Universal Character Set characters

Last Update:

list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set (abbr. UCS...

Word Count : 6987

Universal Coded Character Set

Last Update:

The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology...

Word Count : 1861

Character encoding

Last Update:

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to...

Word Count : 3718

ZX Spectrum character set

Last Update:

ZX Spectrum character set is the variant of ASCII used in the ZX Spectrum family computers. It is based on ASCII-1967 but the characters ^, ` and DEL...

Word Count : 1331

List of Unicode characters

Last Update:

Character Set 2 (MES-2) subset, and some additional related characters. HTML and XML provide ways to reference Unicode characters when the characters...

Word Count : 1827

Null character

Last Update:

ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Character Set (or Unicode), and EBCDIC. It is available in nearly all mainstream...

Word Count : 959

Lotus International Character Set

Last Update:

these characters, the Lotus International Character Set, LICS. Any these extended characters must be erased or replaced with regular keyboard characters before...

Word Count : 1478

Han unification

Last Update:

Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters....

Word Count : 6317

Control character

Last Update:

control characters. The bell character (BEL), which rang a bell to alert operators, was also an early teletype control character. Some control characters have...

Word Count : 3469

Wide character

Last Update:

ASCII characters in memory. Later, computer manufacturers began to make use of the spare bit to extend the ASCII character set beyond its limited set of...

Word Count : 1182

Chinese character IT

Last Update:

uses a few dozen different characters, Chinese language needs a much larger character set. There are over ten thousand characters in the Xinhua Dictionary...

Word Count : 3152

Numeric character reference

Last Update:

sequence of characters that, in turn, represents a single character. Since WebSgml, XML and HTML 4, the code points of the Universal Character Set (UCS) of...

Word Count : 1203

Woody Woodpecker

Last Update:

animated character that appeared in theatrical short films produced by the Walter Lantz Studio and Universal Animation Studio and distributed by Universal Pictures...

Word Count : 5963

Mattel Aquarius

Last Update:

Homecomputer system". "Figure 4. Mattel Aquarius character set" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS...

Word Count : 1303

Simplified Chinese characters

Last Update:

Chinese characters are one of two standardized character sets widely used to write the Chinese language, with the other being traditional characters. Their...

Word Count : 6682

List of stock characters

Last Update:

provides examples. Some character archetypes, the more universal foundations of fictional characters, are also listed. Some characters that were first introduced...

Word Count : 2413

List of XML and HTML character entity references

Last Update:

definition (DTD). In HTML and XML, a numeric character reference refers to a character by its Universal Character Set/Unicode code point, and uses the format:...

Word Count : 3206

GURPS

Last Update:

Players control their in-game characters verbally and the success of their actions is determined by the skill of their character, the difficulty of the action...

Word Count : 4690

ASCII

Last Update:

names for ASCII characters List of computer character sets List of Unicode characters The 128 characters of the 7-bit ASCII character set are divided into...

Word Count : 8053

Xerox Character Code Standard

Last Update:

precursor of, and inspiration for, the Unicode Standard. The International Character Set (ICS) is compatible with XCCS. The XCCS 2.0 (1990) revision covers Latin...

Word Count : 458

PDF Search Engine © AllGlobal.net