Everhart, Glenn From: Joshua Cope [cope@star.enet.dec.com] Sent: Wednesday, February 03, 1999 9:10 AM To: Info-VAX@Mvb.Saic.Com Subject: ODS-5 supported character set [was: ;32767] JF Mezei wrote: > Would files such as "Menu.pour.le.petit.déjeuner.com" work properly ? > > (eg: accented characters). Will they be escaped out, or will they appear > properly on a character cell terminal ? Will pathworks do proper translation > of character sets between VMS, WINDOWS, DOS and MACs when displaying/handling > file names ? All of this is covered in great detail in the documentation, for those of you who have received version 7.2, or have one of the SDK (or field test) CDs. But I might as well get the relevant stuff into DejaNews... this thread is becoming a series of specific questions, and I seem to remember the same sort of questions a couple of months ago, and back around v7.2-FT1. So, here's the entire list of supported characters, and the rules for what is escaped and what's not. [Hit "Next" now if you couldn't care less.] ------ OpenVMS Guide to Extended File Specifications, Appendix B Section 2.1.2: Additional Characters On ODS-5 volumes, RMS supports access to files and directories with names that contain arbitrary 8-bit characters, except for the C0 control set (hexadecimal 00 through 1F) and the following characters: Double quotation marks (") Asterisk (*) Backslash (\) Colon (:) Left and right angle brackets (< >) Slash (/) Question mark (?) Vertical bar (|) Note that this explicitly includes both the C1 character set (hex 80-9F) as well as graphical and other characters between 9F and FF. This allows the entire ISO Latin-1 character set (with the 7-bit character exclusions noted above) and any defined Unicode character. ------ Section 2.2.6: Canonical Form of File Specifications In some cases, there are multiple ways to write the same characters. For example, ^20, ^ , and ^_ are all equivalent. When RMS outputs a file specification (as a resultant name, for example), it follows these rules to determine which form to use: Any character that cannot be represented with eight bits is represented as ^Uxxxx, where xxxx is four hexadecimal digits. Space is represented as the escape character followed by an underscore (^_). Other ISO Latin-1 8-bit characters that have no graphical representation or which are used for control functions by other OpenVMS software or by terminals is represented by an escape character followed by two hexadecimal digits (^xx). Otherwise, it is represented by its own character. The following 8-bit values are output as an escape character followed by two hexadecimal digits. 7F (rubout) 80-9F (C1 control characters) A0 (nonbreaking space) FF (Latin small letter y diaeresis) If the file specification is longer than 255 bytes and must be output through a NAM block, a DID or FID abbreviation is used. The following characters are output preceded by the escape character (^): Exclamation point (!) Pound sign (#) Ampersand (&) Apostrophe (') Grave accent (`) Left parenthesis (() Right parenthesis ()) Plus sign (+) Atsign (@) Left brace ({) Right brace (}) Period (.) Comma (,) Semicolon (;) Left bracket ([) Right bracket (]) Percent sign (%) Circumflex (^) Equal sign (=) ------ Hopefully that's enough to hold anyone over until the docsets arrive and you can read it for yourself ;) Joshua Cope OpenVMS Quality, Test & Validation ------------------------------------------------------------ The above opinions and information are not necessarily those of Compaq Computer Corporation. ------------------------------------------------------------