Elias Mårtenson

@loke@functional cafe

For the last few years, I've solved some of the Advent of Code problems using Kap. It's turned out to work quite well, and Kap has turned out to be a pretty good language for these kinds of problems.

There have been a few other people solving these problems in Kap, but not too many, so I figured that I need to promote the language a bit more.

So, for the 2025 challenge, I'll give out a prize to the person who provides the nicest solutions in Kap to one or more of the problems. I'm not sure what that prize will be yet, but we're talking about a small token thing, around 100€ or so?

If anyone wants to take this on, let me know at @loke@functional.cafe or the Matrix room.

For more information about Kap, visit the website: https://kapdemo.dhsdevelopments.com/

It appears a lot of my inspiration for writing blog posts comes from me listening to podcasts. This time I was listening to A Problem Squared with Bec Hill and Matt Parker. The episode in question was Episode 103 – Heaps Splad and Paul’s Pad.

I strongly recommend listening to the episode (in fact, all of the episodes) because it's really good. But, in case you don't have time to listen to it right now, please allow me to summarise one of the problems being solved:

Are there any words which, when encrypted using a Caesar cipher, yields another valid word?

Matt Parker explained that he iterated over all the words using some (in his words, “terrible”) Python code. Whether the code actually was terrible or not will remain unanswered (the code is likely not bad, since if you are good enough to know your code is bad, you're probably better than average), but regardless of the quality of the code, I think we can write something much more concise. We can make this assumption simply by virtue of the fact that Python was used.

We'll obviously use Kap for this, because why not?

Getting the word list and cleaning it up

I don't know which word list Matt Parker uses, but I found this one on Github which contains 466,550 words, which is in the same ballpark as the one mentioned in the podcast.

Let's load it into a variable, and print the size:

    words ← io:read "words.txt"
    ≢words
466550

The list contains a lot of words with punctuation, numbers, etc. There are also upper case letters in it that we want to turn into lowercase. This is how we fix this:

words ← ∪ ("^[a-z]+$" regex:match)¨⍛/ unicode:toLower¨ words

The entire list is passed through the unique () function to remove any duplicates due to upper and lower case versions of the same word existing in the list.

After doing this cleanup, we're left with 416,292 words.

Computing the ciphered versions of a word

Our goal is to create a function which accepts a string, and returns the 25 different rotated versions of that word (26 letters in the English alphabet, but we are not interested in the 0-rotated version).

To do this, we will compute the index of each letter (by subtracting @a), add the offset, take the result mod 26 and convert back to a letter (by adding @a). Let's write a function called scramble that does this:

scramble ⇐ { @a + 26| ⍺ + ⍵-@a }

Let's try it:

    1 2 3 4 scramble¨ ⊂"abcd"
┌→──────────────────────────┐
│"bcde" "cdef" "defg" "efgh"│
└───────────────────────────┘

Finding all the words

Since we're going to do a lot of word matches, we'll start by putting all the words in a hashmap. Let's call it wordsMap:

wordsMap ← map (⍪words),1

Next, we'll compute a 2-dimensional array where each row represents a word, and each column the n-scramble of that word. We can do that with an outer product with the words on the left and the numbers 0 to 25 on the right (represented using ⍳26):

allScrambledWords ← words scramble⍨⌻ ⍳26

Since this performs the computation on every word this will take a while (about 10 seconds on my workstation, which is reasonably fast) so don't worry if you don't get a result right away.

Now we want to find which of these scrambled words are real words, which means we need to look them up in the hashmap we created earlier.

When using bracket indexing to look up a value in a hashmap, the return value is null if the key does not exist in the hashmap. So, we'll simply do a not-equals comparison against null to get a boolean array that indicates the locations where we found a word. We'll then assign this boolean array to the variable result:

result ← null ≢¨ wordsMap[allScrambledWords]

Analysing the results

Now comes the fun part. We now have all the data we need, so now we can start answering questions about this data.

How many words scramble into at least one word?

To solve this, we drop the left-most column (since that column represents the 0-scramble, or the word itself) and perform an OR-reduce over the rows and sum the result:

    +/ ∨/ 0 1↓ result
10850

Create a table of the results

Now we know the resulting list is 1,0850 elements long. We can

For each row in result and allScrambledWords, we'll pick the values where the row in result is 1, and then filter out any rows which only have a single entry (i.e. it doesn't scramble into another valid word):

groups ← (1<≢¨)⍛/ result ((⊂(/))⍤1) allScrambledWords

We can confirm that the resulting list is 1,0850 elements long:

    ≢groups
10850

We'll also note that if a word scrambles into N different other words, then the same set of words will be repeated N times in this list. We'll eliminate these duplicates by sorting each sublist and calling the function on the result, then assigning that result back to the groups variable:

groups ← ∪ ∧¨ groups

What is the longest word in the result?

To find this, we'll just sort by the length of the first word:

groups[⍒(≢↑)¨groups]

The first result is “abjurer”/“nowhere”, which is 7 letters. It's a bit disappointing that there isn't anything longer, but I was prepared for this after listening to the podcast.

Another interesting results are “unfiber”/“bumpily”, but to be honest, most of the 7-letter results are boring.

Which is the word with the most matches?

To find this, we just sort by the length of each element:

groups[⍒≢¨groups]

This gives us a very uninteresting list, since the top results are all the one and two letter words, because for some reason the word list contains all single letters and almost every combination of two letters, so we'll have to filter them out:

((2<≢↑)¨)⍛/ groups[⍒≢¨groups]

The three letter results are also not interesting at all. Apparently ajc is a word, which scrambles into gpi, mvo, yha, etc.

Four letters is more interesting. We'll display the first 8 rows:

    8 ↑ ⍪ ((3<≢↑)¨)⍛/ groups[⍒≢¨groups]
┌→─────────────────────────────────────────────────┐
↓┌→───────────────────────────────────────────────┐│
││"anan" "bobo" "erer" "lyly" "nana" "pcpc" "vivi"││
│└────────────────────────────────────────────────┘│
│              ┌→─────────────────────────────────┐│
│              │"adad" "bebe" "fifi" "lolo" "ruru"││
│              └──────────────────────────────────┘│
│              ┌→─────────────────────────────────┐│
│              │"afcc" "chee" "diff" "joll" "purr"││
│              └──────────────────────────────────┘│
│              ┌→─────────────────────────────────┐│
│              │"cece" "gigi" "momo" "susu" "yaya"││
│              └──────────────────────────────────┘│
│                     ┌→──────────────────────────┐│
│                     │"abib" "dele" "novo" "stat"││
│                    └───────────────────────────┘│
│                     ┌→──────────────────────────┐│
│                     │"actg" "cevi" "priv" "yare"││
│                     └───────────────────────────┘│
│                     ┌→──────────────────────────┐│
│                     │"adds" "beet" "illa" "lood"││
│                     └───────────────────────────┘│
│                     ┌→──────────────────────────┐│
│                     │"admi" "benj" "firn" "svea"││
│                     └───────────────────────────┘│
└──────────────────────────────────────────────────┘ 

What are the takeaways from all of this?

Some readers may walk away from this wondering why one should bother with this complicated syntax that uses symbols instead of words.

But the key point here is how we could solve this problem without actually writing any code. In a way that is not too dissimilar to how a good Unix user can solve almost anything on the commandline without writing a program, the array languages (and not just Kap) are all very good at letting you get to the result without actually doing any “programming”.

Whether or not it's worth the effort to learn the syntax in order to be able to leverage the power of these tools is up to you. I personally believe it's worth it.

Last weekend I was listening to episode 99 of Array Cast. The topic of the fortnight was array indexing, or the way to read values out of arrays.

A programmer unfamiliar with array languages may wonder how such a simple concept can fill an entire hour-and-a-half episode (and they may even have to extend it to another episode), and to answer that question I would recommend listening to the episode, but the short answer is that array languages focus on organising data in arrays and then performing actions on entire arrays in one operation. This means that reading and writing data to/from arrays is probably the most important operation one does when using an array language.

In the episode, Stephen Taylor talks about how Q allows for indexing the columns of a table by name rather than a number. This is a nice feature that it shares with R.

However, Kap also has this functionality. It's really not well documented at all and I'll try to improve the documentation, but the hope is that this blog post will serve as an introduction to this feature.

Creating labelled arrays

In Kap, arrays can carry some metadata that describes its content. This metadata includes the axis labels, which are updated or accessed using the function labels (docs).

Here's how you can assign labels to the columns of an array:

    content ← 4 3 ⍴ ⍳12
┌→──────┐
↓0  1  2│
│3  4  5│
│6  7  8│
│9 10 11│
└───────┘
    content ← "foo" "bar" "abc" labels content
┌───┬───┬───┐
│foo│bar│abc│
├→──┴───┴───┤
↓  0   1   2│
│  3   4   5│
│  6   7   8│
│  9  10  11│
└───────────┘

The labels is additional metadata and does not change the shape nor content of the array:

    ⍴ content
┌→──┐
│4 3│
└───┘

The labels function accepts an array of strings on the left, and assigns them to the given axis. The number of elements have to match the size of the given axis. By default this function assigns labels to the last axis, but an axis argument can be used to specify which axis to use:

    content ← "Satu" "Dua" "Tiga" "Empat" labels[0] content
┌───┬───┬───┐
│foo│bar│abc│
├→──┴───┴───┤
↓  0   1   2│
│  3   4   5│
│  6   7   8│
│  9  10  11│
└───────────┘

The above call assigned labels to each row, but it's not displayed. The labels are there, but the current version of the text renderer does not include them. However, we can transpose the array to show that they are there:

    ⍉ content
┌────┬───┬────┬─────┐
│Satu│Dua│Tiga│Empat│
├→───┴───┴────┴─────┤
↓   0   3    6     9│
│   1   4    7    10│
│   2   5    8    11│
└───────────────────┘

Calling labels monadically will return the labels for a given axis:

    labels content
┌→────────────────┐
│"foo" "bar" "abc"│
└─────────────────┘

Using labels

Some functions returns arrays with labels where appropriate. The main one being the sql:query (reference) function which adds the column names as labels.

When reading CSV data, the first row is often a set of labels:

    csv ← io:readCsv "file.csv"
┌→──────────────────────────┐
↓"col1" "col2" "col3" "col4"│
│     0      1      2      3│
│     4      5      6      7│
│     8      9     10     11│
│    12     13     14     15│
│    16     17     18     19│
└───────────────────────────┘

When loading a CSV, instead of just dropping the first row, it can be assigned as column labels:

    csv ← (>1↑)«labels»↓ csv
┌────┬────┬────┬────┐
│col1│col2│col3│col4│
├→───┴────┴────┴────┤
↓   0    1    2    3│
│   4    5    6    7│
│   8    9   10   11│
│  12   13   14   15│
│  16   17   18   19│
└───────────────────┘

Selection by label

The Kap standard library contains the s namespace, which contains a few utility functions which are not documented yet, since they are still a work in progress.

One of these functions is the s:cols function, which returns columns from a table based on the column labels:

    "col3" "col4" s:cols csv
┌────┬────┐
│col3│col4│
├→───┴────┤
↓   2    3│
│   6    7│
│  10   11│
│  14   15│
│  18   19│
└─────────┘

There is also the s:col function if you only want to read one column:

    "col2" s:col csv
┌→──────────┐
│1 5 9 13 17│
└───────────┘

Chart functions

If present, axis labels will automatically be used as labels in charts:

temperatures ← `
        "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" labels "Singapore" "London" labels[0] 2 5 ⍴ 31 29 30 31 31 5 7 10 13 13
chart:line temperatures

This will display the following chart:

Labelled axis in chart

This is a short update to my previous post where I will solve the second part. At the end, I will also include one-line solutions to the two problems, which really doesn't involve more than removal of unnecessary variable assignments.

Revisiting part 1, here's the full solution that was written in that post (with a few less variable reassignments):

numbers ← ⍎¨ ⊃ {(@\s≠⍵) ⊂ ⍵}¨ io:read "part01.txt"
(a b) ← ⊂[0] numbers
a ← ∧a
b ← ∧b
+/ | a - b

Solution to part 2

Let's keep the first two lines from the solution to part 1. Here we have the two arrays in a and b.

The problem statement explains that for each number in the left column (array a), we want to multiply that value with the number of times that number occurs in the right column (array b).

This begs the question: How do we count the number of occurrences of a value in an array?

Let's say we have the following array: 1 2 10 2 2 1 5, and we want to count the number of 2's.

We take advantage of the fact that the values for false and true in Kap is 0 and 1 respectively. So all we need to do is to compare the array with the value we're looking for:

    2 = 1 2 10 2 2 1 5
┌→────────────┐
│0 1 0 1 1 0 0│
└─────────────┘

And then we can take the sum of the result:

    +/ 2 = 1 2 10 2 2 1 5
3

We can now write a function that takes the value to search for on the left, and the full array of values to search for on the right:

{⍺ × +/⍺=⍵}

Since the left and right arguments are called and respectively, this multiplies the left argument with the number of occurrences of that value in the right argument.

All we have to do now is to call this function once for each value in a, passing in the entire argument b to the function, and then sum the result:

+/ a {⍺×+/⍺=⍵}¨ ⊂b

We use to enclose the array b into a scalar in order to ensure that the entire array is passed to each invocation, instead of mapping over all elements in the array.

The final code looks like this:

numbers ← ⍎¨ ⊃ {(@\s≠⍵) ⊂ ⍵}¨ io:read "part01.txt"
(a b) ← ⊂[0] numbers
+/ a {⍺×+/⍺=⍵}¨ ⊂b

One-line solutions

One-line solution to part 1

We can make the code a bit less verbose by not breaking out the two arrays into variables and instead keep it as a 2-element list. And once we did that, there is no need for the variable numbers, which turns the solution into a single line:

+/ | ⊃-/ ∧¨ ⊂[0] ⍎¨ ⊃ {(@\s≠⍵) ⊂ ⍵}¨ io:read "part01.txt"

One-line solution to part 2

There are shorter solutions to this that I was considering including here, but they take advantage of more advanced concepts, so instead I show this solution, which should be understandable without knowing how compositional operators work.

We're simply creating an outer function to automatically bind the two arrays as function arguments instead of explicit variable assignment.

{ +/ ⍺ {⍺×+/⍺=⍵}¨ ⊂⍵ }/ ⊂[0] ⍎¨ ⊃ {(@\s≠⍵) ⊂ ⍵}¨ io:read "part01.txt"

So Advent of Code is done, and I made an actual effort this time. Of course I used Kap, and the final tally ended up being 32 out of 50 stars. I may solve some more later if I feel bored.

The Advent of Code problems tend to be very good fits for array languages, and array language solutions are often much shorter than solutions in other languages. In particular, the parsing which is often annoying in many languages often reduce down to a few characters in Kap.

Let's take a look at day 1 and solve it in Kap. I will attempt to describe the steps like one would solve it interactively in the REPL, rather than show a program and explaining how it works.

In practice, most of the you use Kap to solve problems, it is used as an interactive calculator. Compared to most other languages, the point at which one has to leave the REPL and open an editor instead is much later. Many of the early Advent of Code problems can be solved this way, for example. Day 1 is certainly one such example.

The description for day 1 basically boils down to the following problem:

Given a text file containing a 2-column list of numbers, match up the smallest number in each column and take the difference between them, then do the same with the second-smallest pair and so on. Finally sum up all the differences.

What this means is that we want to do the following:

  • Parse the input file into two lists of numbers
  • Sort each list individually
  • For each pair of numbers, subtract one from the other and take the absolute value
  • Sum the resulting list

Let's write the Kap code to do each of these steps:

Parse in the input file

We'll store the input in a file called part01.txt, which has the following form. I'll include 3 lines of input in order to make the examples small, but the actual input data contains 1000 pairs of numbers.

3   4
15  3
2   10

Kap provides a handy function called io:read which accepts a filename and returns a list of strings, where each string is one line from the input file. Thus, we can type the following:

    io:read "part01.txt"
┌→────────────────────────┐
│"3   4" "15   3" "2   10"│
└─────────────────────────┘

For each line, we want to split the string on spaces. This can be done using a Regex, but since this is such a simple split, we'll use a more traditional approach.

The function takes a bitmap on the left, and an array on the right, and returns a new array where the elements are grouped based on the values on the right (the function can do more than that, but for example, this is all we need). Here's an example:

    1 1 0 1 1 ⊂ "abcde"
┌→────────┐
│"ab" "de"│
└─────────┘

To use this, all we need to do is to somehow create a bitmap that contains 0 wherever there is a space, and 1 otherwise. This is achieved by