[Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


2.    Codesets

The Tru64 UNIX operating system software supports the following Thai codeset:

The TACTIS codeset, shown in Figure 2-1, is composed of the ASCII (ISO 646-1991) character set and the TIS 620-2533 character set. This is an 8-bit codeset with characters assigned values from 0x0 to 0xFF.

Figure 2-1: TACTIS Codeset


[Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


2.1    Character Classification

To facilitate the processing of characters encoded in the TACTIS codeset, such as displaying Thai characters and input-sequence checking, characters are classified into several categories:

- Leading vowels (LV). The five leading vowels defined in TIS 620-2533.

- Following vowels (FV). The six following vowels defined in TIS 620-2533.

- Below vowels (BV). The two below vowels defined in TIS 620-2533.

- Above vowels (AV). The five above vowels defined in TIS 620-2533.

- Tone marks (TONE). The four tone marks defined in TIS 620-2533.

- Above diacritics (AD). The four above diacritics defined in TIS 620-2533.

- Below diacritic (BD). The below diacritic defined in TIS 620-2533.

- Graphic characters. The 94 graphic characters defined in ISO 646-1991. They include:

* 52 English alphabetic characters (A-Z, a-z)

* 10 digits (0-9)

* 32 special characters: 21-2F, 3A-3F and 7B-7E

- Space. The character code is 20.

- No-Break space. The character code is A0.

- Thai digits. The ten Thai digits defined in TIS 620-2533.

- Thai digits. The ten Thai digits defined in TIS 620-2533.

- Word separator. The word separator defined in TIS 620-2533.

- Reserved code points. Six code points are reserved for future use.

To meet some special requirements of Thai input and output, some character classes, such as FV, BV, AV, and AD, are further divided into subclasses. For details, see Table 2–1.

Table 2-1: Thai Character Classification

Class

Number

Description

CTRL

66

ISO 646-1991 control codes: 00-1F, 7F, 80-9F, FF

NON

119

  • ISO 646-1983 character codes: 20-7E
   
  • TIS 620-2533 character codes: A0, CF, DC, DF, E6, EF, F0-F9, FA, FB.
   
  • Reserved code points: DB, DD, DE, FC, FD, FE.

CONS

44

A1-C3, C5, C7-CE

LV

5

E0, E1, E2, E3, E4

FV1

3

D0, D2, D3

FV2

2

E5

FV3

2

C4 and C6 (These two characters also behave as LV in the case of LV+CONS)

BV1

1

D8

BV2

1

D9

BD

1

DA

TONE

4

E8, E9, EA, EB

AD1

2

ED, EC

AD2

1

E7

AD3

1

EE

AV1

1

D4

AV2

2

D1, D6

AV3

2

D5, D7


[Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


2.2    Character Levels

Characters defined in the TACTIS codeset can also be classified according to character levels. There are five character levels:


[Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]