Legacy character sets

September 10, 2023

A number of years ago I noted that Unicode did not contain all the characters in PETSCII, the character set used by the Commodore 64 and other classic Commodore computers. The Wikipedia page at the time even explained that some symbols were missing in their character table due to the fact that Unicode didn't support them.

I decided to make a post about it to the Unicode mailing list, and some people there agreed. The discussion expanded to talk about other 80's systems whose character sets were also missing.

So a working group was formed which I was a part of, and a proposal was made to add a number of characters which were used by several old systems. After a few failed attempts, the proposal was finally accepted, and it was included in Unicode 13.

I'm quite happy that my small contribution to this project helped fill a gap that I believe needed to be filled.

Let's use these symbols

A few years have now passed, and font support for these symbols is more widespread. This is good, because the character set contains symbols that are not just useful for interoperability with old systems, but are useful on their own. In particular, the BLOCK SEXTANT characters.

These are the 60 characters starting from U+1FB00, along with 4 more characters that already existed elsewhere. Here they are:

 🬀🬁🬂🬃🬄🬅🬆🬇🬈🬉🬊🬋🬌🬍🬎🬏🬐🬑🬒🬓▌🬔🬕🬖🬗🬘🬙🬚🬛🬜🬝🬞🬟🬠🬡🬢🬣🬤🬥🬦🬧▐🬨🬩🬪🬫🬬🬭🬮🬯🬰🬱🬲🬳🬴🬵🬶🬷🬸🬹🬺🬻█

Consider a character that is broken up into 3 rows by 2 columns. These characters are all 64 different combinations of blocks. In other words, we can use these to draw graphics at a resolution which is 2 times the width and 3 times the height of the terminal screen.

Kap is really good at working with arrays of data, so let's write a program to convert a 2-dimensional arrays of bits to a character array of BLOCK SEXTANT characters.

First we need to create the list of character above. Here's the code to do this:

blockMapping ← (@\u2590,)⍢(0x2A↓) (@\u258C,)⍢(0x15↓) @\uA0 , (@\u1FB00+⍳60),@\u2588

Let's break it down. The first part creates a list of the 60 characters in the legacy computing code block, and then prepends U+00A0 NO-BREAK SPACE, and append U+2588 FULL BLOCK. These two characters already existed in Unicode so they have different numbers:

@\uA0 , (@\u1FB00+⍳60),@\u2588

We then want to insert the two remaining symbols that also existed previously: U+2590 RIGHT HALF BLOCK and U+258c LEFT HALF BLOCK. We insert these using structural under.

Structural under deserves a block post of its own (and I've already written extensively about it), so for now we'll just note that the following inserts a at position b in c:

(a,)⍢(b↓) c

Now that we have the characters in a string, all we have to do is to merge each 3-by-2 block in the input array, decode them as a binary number and pick the appropriate character based on the resulting number. Here's the full function:

draw ⇐ {
	(2=≢⍴⍵) or throw "Argument must be a 2-dimensional array"
	v ← ((⍴⍵) + 3 2 | -⍴⍵) ↑ ⍵
	blockMapping ⊇⍨ (2⊥⌽)¨ ((¯2+≢v) ⍴ 1 0 0) ⌿ 3,⌿  ((¯1+1⊇⍴v) ⍴ 1 0) / 2,/v
}

The first line in the function simply raises an error if the input is not a 2-dimensional array. The second line extends the array so the width is divisible by 2 and the height is divisible by 3. Finally, the third line does all the work, computing the resulting image.

Let's try it:

io:print draw ⊃ {⍵⍴1}¨ ⌽⍳10

This should print:

███🬝🬀
██🬆  
🬝🬀

The quality of the output depends on the font used, and I note that the default font used on write.as (where this blog is hosted) is not very good. With a proper font it looks a lot better.

Try the full example on the Kap web client.