What Is Character Encoding?

Character encoding (English: Character encoding), also known as character set code, is to encode characters in a character set into an object in a specified set (for example: bit pattern, natural number sequence, 8-bit group, or electric pulse), so that the text is in the computer Storage and delivery through communication networks. Common examples include encoding the Latin alphabet into Morse code and ASCII. Among them, ASCII numbers letters, numbers, and other symbols, and uses a 7-bit binary to represent this integer. Usually an extra bit is used to facilitate storage in a 1-byte manner.

Character Encoding

Character encoding (English: Character encoding), also known as the character set code, is
American (National) Information Interchange Standard (code) code, a scheme that uses 7 or 8 binary bits for encoding, and can give up to 256 characters
Text, pictures and other information seen on the monitor are inside the computer
For expansion
GB2312 is also one of the ANSI codes.
GBK is the Chinese character inner code expansion specification, and K is the initial of the "expanded" word in extended Chinese pinyin. English full name Chinese Internal Code Specification. The GBK coding standard is compatible with GB2312. It contains 21003 Chinese characters and 883 symbols. It also provides 1894 code points for making Chinese characters. The GB2312 code is the national Chinese character information exchange code for the People's Republic of China.
Traditional Chinese in Taiwan, Hong Kong and Macau
As stated in the ANSI coding regulations above, there are multiple codes in the world
In order to improve the encoding efficiency of Unicode, UTF-8 encoding appeared. UTF-8 can automatically select the encoding length according to different symbols. For example, you can use only one English letter
Some email systems (such as foreign mailboxes) do not support non-English alphabets (such as Chinese characters),
Base64 encoding
This is due to historical reasons (think that only the United States uses email?). Because an English letter is stored using ASCII encoding, which occupies 1 byte (8 bits) of the memory, it is actually only stored in 7 digits in binary. The first digit is not used, and it is set to 0. The system considers that any byte whose first bit is 1 is wrong. Some encoding schemes (such as GB2312) not only use multiple bytes to encode a character, and the first digit is always 1, so the mail system replaces 1 with 0, so people who receive the mail will find the mail garbled.
In order for the mail system to send and receive letters normally, it is necessary to convert the symbols stored in other codes into ASCII codes for transmission. For example, send GB2312 code at one end-> according to Base64 rules-> convert to ASCII code, and the receiving end receive ASCII code-> according to Base64 rules-> restore to GB2312 code. .

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?