Overview of Codesets

Codesets, or code pages, are a means of providing support for character sets and keyboard layouts used in different countries. Codesets are widely used to denote encoding conversion methods. Microsoft and IBM have assigned different code page numbers to the encoding methods and these are universally followed.

A code point value is assigned to each character in the codeset. This encoded value relates the binary character codes used by a program to keyboard keys and the appearance of characters on the screen. Within a given code page, a code point has one, and only one, specific meaning. For example, in a Windows environment (US English), the character ‘A’ is represented by the number 65 (Hexadecimal ‘41’). In Mainframe environment (US English), the character ‘A’ represented by the number 129 (Hexadecimal ’C1’).

Devices such as the display and the keyboard can be configured to use a specific code page and to switch from one code page (e.g., United States), to another (e.g.,) Portugal.

Code pages are classified by the following standards:

· Single byte character sets (SBCS) – Character encoding in which each character is represented by one byte. Single-byte character sets are limited to 256 characters, as in the ASCII standard code page 437.

· Double byte character sets (DBCS) – Any 2-byte form of character encoding; a specific type of multibyte character set that includes some 2-byte characters. Double-byte character sets accommodate more than 256 characters.

· Multibyte character set (MBCS) – A mixed width set, in which some characters consist of more than one byte.

· Unicode – A 16-bit character encoding standard. By using two bytes to represent each character, Unicode enables almost all of the written languages of the world to be represented in the form of text files. (By contrast, 8-bit ASCII is not capable of representing all of the combinations of letters and diacritical marks that are used with the Roman alphabet.) Approximately 28,000 of the 65,536 possible combinations have been assigned to date, 21,000 of them being used for Chinese. The remaining combinations are open for expansion.

Comment on this topic

Topic ID: 730088