What Is a Code Point?
Point codes are also called code points. The range of integers used to encode abstract characters is called code space, and special integers are called code points. When an abstract character is mapped or specified as a special code point in code space, it is called an encoded character. [1]
- There are many ways to classify code points. Not all assigned code points represent abstract characters. Only diagrams, formats, controls, and private code points can do this. Proxy and non-characters are specified as code points, but cannot be specified as abstract characters. Reserved code points can be assigned, and any code point can be assigned to a later version of the standard. General classification provides fine classification of graphic characters, which is also different from other basic types (except non-character and reserved characters). The attributes defined in the Unieode character database provide different classifications for the Unicode code point classification system.
- Control code: 65 code points (U + 0000..U + 001F and U + 007F..U + 009F) are reserved as control codes because
- Code points in the Unicode standard are allocated according to the following principles.
- 1. If there is a recognized text standard, the Unicode standard generally follows this standard for characters in related sequences in this text.
- 2. The first 256 codes follow ISO / IEC 8859-1 (Latin 1), and the 7-bit ASCII (ISO / IEC 646 IVR) code occupies the first 128 code positions.
- 3. Characters with common characteristics are adjacent. For example, basic Arabic character blocks are modeled according to ISO / IEC 8859-6. Arabic characters used in Persian, Urdu, etc. are not included in ISO / IEC 8859-6, but are allocated after basic Arabic character code blocks. The text written from right to left is also arranged together.
- 4. To the extent possible, text allocation is to not exceed the 128 code point boundary. For supplementary characters, an additional constraint that does not exceed the 1024 code point boundary is required. These constraints better optimize the task of creating tables. To access character properties.
- 5. The code represents letters, punctuation, symbols, and pronunciation symbols. Generally, they are used in multiple languages or words and are distributed in corresponding positions together.
- 6. The Unicode standard does not associate character code assignments with dependent language collations or computer-aided software engineering.
- 7, CJK unified pictographs are arranged in three parts, each part is arranged according to the pictographs. This order is sorted by the number of basic strokes. [1]