Mysteries of ASCII, part Ⅰ

Posted on Fri, 20 Jul 2007

I'm currently reading an old (1993) book called Software Internationalization and Localization: an Introduction (ISBN: O-442-01498-8) that a collegue leant me so I can learn about the various aspects of internationalisation for protocol design. It's really quite interesting really, and I really think every student educated in any Computer Science or Information Science paper should know about the various issues.

One of the things that struck me is the use of spaces to delimit words, when spaces may be used quite legally in places that would not be considered a breaking space. One example (§2.2.7) is as the thousands seperator in numbers in some languages. I don't know which languages encompass this rule, if you know, please tell me.

In HTML, we have the escape sequence   for non-breaking-space. Unicode has NO-BREAK SPACE (U+00A0). Does ASCII have such a space? No. Well, not 7-bit ASCII (which is what the term ‘ASCII’ means), but there is in extended ASCII.

I remember, as a boy, writing some of the programs in Peter Norton's books about the PC, and BASIC programming. One of the pages I would frequently turn to would be the chart with the 7 and 8-bit ASCII characters. One thing that puzzled me at the time was why character 255 is the same as a space. Well, it looks the same, but its not. Character 255 (in those code-pages where it is used, I don't know how well they are deployed) is the non-breaking space.