Standard set of characters defined by ISO/IEC 10646
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Universal Coded Character Set" – news · newspapers · books · scholar · JSTOR(April 2020) (Learn how and when to remove this message)
Universal Coded Character Set
Alias(es)
UCS, Unicode
Language(s)
International
Standard
ISO/IEC 10646
Encoding formats
UTF-8, UTF-16, GB 18030 Less common: UTF-32, BOCU, SCSU, UTF-7
Preceded by
ISO/IEC 8859, ISO/IEC 2022, various others
v
t
e
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.
The UCS has over 1.1 million possible code points available for use/allocation, but only the first 65,536, which is the Basic Multilingual Plane (BMP), had entered into common use before 2000. This situation began changing when the People's Republic of China (PRC) ruled in 2006 that all software sold in its jurisdiction would have to support GB 18030. This required software intended for sale in the PRC to move beyond the BMP.[clarification needed]
The system deliberately leaves many code points not assigned to characters, even in the BMP. It does this to allow for future expansion or to minimise conflicts with other encoding forms.
The original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP. A range of code points in the S (Special) Zone of the BMP remains unassigned to characters. UCS-2 disallows use of code values for these code points, but UTF-16 allows their use in pairs. Unicode also adopted UTF-16, but in Unicode terminology, the high-half zone elements become "high surrogates" and the low-half zone elements become "low surrogates".[clarification needed]
Another encoding, UTF-32 (previously named UCS-4), uses four bytes (total 32 bits) to encode a single character of the codespace. UTF-32 thereby permits a binary representation of every code point in the APIs, and software applications.
and 22 Related for: Universal Coded Character Set information
The UniversalCodedCharacterSet (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology...
repertoire over time. A codedcharacterset (CCS) is a function that maps characters to code points (each code point represents one character). For example, in...
Extended Binary Coded Decimal Interchange Code (EBCDIC; /ˈɛbsɪdɪk/) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer...
tool for shaping wood Plane (Unicode), in the UniversalCodedCharacterSet, a continuous group of 216 code points Plane, part of a telecommunications network...
The Universal Product Code (UPC or UPC code) is a barcode symbology that is used worldwide for tracking trade items in stores. The chosen symbology has...
created a draft proposal for adding the Theban alphabet to the UniversalCodedCharacterSet/Unicode. "Theban alphabet". Omniglot. Retrieved March 6, 2023...
5428:1984, Greek alphabet codedcharacterset for bibliographic information interchange, is an ISO standard for an 8-bit character encoding for the modern...
the SMP. GB/T 20542-2006 ("Tibetan CodedCharacterSet Extension A") and GB/T 22238-2008 ("Tibetan CodedCharacterSet Extension B") are Chinese national...
items Universalcode (typography), a standard set of characters in typography Universalcode (cartography), another term for the Natural Area Code, a geocode...
character reference refers to a character by its UniversalCharacterSet/Unicode code point, and a character entity reference refers to a character by...
ZX Spectrum characterset is the variant of ASCII used in the ZX Spectrum family computers. It is based on ASCII-1967 but the characters ^, ` and DEL...
National Standard for Information Systems — CodedCharacterSets — 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII), ANSI...
7-bit encoding, from which the encoding and mapping to the UCS (UniversalCodedCharacterSet (ISO/IEC 10646) and Unicode standards) were also derived a few...
Unicode Standard: Unicode and the ISO's UniversalCodedCharacterSet (UCS) use identical character names and code points. However, the Unicode versions...
external text encodings and provides string and character types based on UniversalCodedCharacterSet 2 (UCS-2). Allegro CL can be used with and without...
maximum message length of 1395 characters in the Latin alphabet, and 615 characters in UniversalCodedCharacterSet (UCS-2) encoding in order to support...
identify characters without relying on their codes. The names of characters are coordinated with other characterset standards, notably the UniversalCoded Character...