Array languages vs. the curse of the spreadsheet
It has been said that the most popular programming language in the world is the spreadsheet (and in particular these days, Excel).
There is a valid argument that the spreadsheet embodies an array programming model, making Excel the most popular array programming language in the world.
As the author of an array programming language based on the ideas of APL, I see the popularity of the spreadsheet as both a problem (because it's so error-prone) and an opportunity (because while the fans of imperative languages like Python are probably too set in their ways to consider an alternative model of computation, users of spreadsheets are already thinking in terms of matrices of values, and some of those users may be more open to the idea of performing the same computation in a more reliable way).
Before discussing how to leverage the power of spreadsheets with a proper array based programming language, there is a need to discuss how spreadsheets are used today, and some of the problems with them.
The different uses of spreadsheets
When people use a spreadsheet, it's to do one of several things. In short, spreadsheets are used to perform many different activities, which can roughly be grouped into 3 main categories:
- Data entry
- Computation
- Presentation
Data entry
Ever since VisiCalc was released in 1979, the spreadsheet has presented the user with a hugely inviting blank sheet of empty cells when the program was started.
What do you want to calculate today? |
The page almost calls out to the user to start inputting numbers. Our innate desire to organise things in categories is what has made this design so successful. The VisiCalc initial screen looked like this, as does the latest version of LibreOffice some 45 years later.
All you have to do is to move the cursor to the cell you want to edit, type the values and things will just work. While data entry specialist is still a job title in use today, there was a time when any data input was a complicated process that required specialist knowledge. The spreadsheet changed this to allow anyone to record anything they wanted in digital form.
Computation
Once the data is in the spreadsheet, you inevitably want to work with the numbers to produce some output.
Spreadsheets makes it very easy to take the step from simple data entry to computation of results. All you have to do is to add a =sum(a1:a5)
to the sheet you'll have the sum in the cell you want.
Presentation
Spreadsheets aren't just about entering data and computing some results based on that data. It's also about presenting it in an appealing manner.
This is really where the you can see the genius of the invention. From something as simple as looking at a set of numbers, with the sum of those numbers neatly displayed as “Total:” below the values, to a complex form with coloured highlighting and different text styles to emphasise the results.
These sheets also become interactive without any extra work, since any change in the source data will automatically propagate to the sheet that contains the results.
The problems with spreadsheets as a programming environment
The spreadsheet paradigm hides what's important (the formulas) in favour of the superficial (the data)
Once you've taken that first step to compute a sum in a spreadsheet, it's not a very long road until you find yourself in one of those multi-sheet monsters most people who work in a large company have seen. I've certainly spent more time than I want to admit trying to untangle one of those bowls of spaghetti trying to find out why a certain number is computed the way it is.
Here's a silly quiz: Can you spot the error in the above spreadsheet?
I don't think is possible to tell just by looking at the screenshot. The problem is that cell D5 has the actual value entered instead of a formula (this could have happened due to some simple copy&paste error).
And this is the true danger of spreadsheets. These kinds of mistakes are completely invisible when they happen. Instead, the results start to deviate (probably slowly) once the input data start to change, and you could be looking at invalid numbers for quite some time before anyone notices the problem. And if the dataset is large, it may be a lot of work to actually find the underlying mistake.
And this bring us to my proposition: Spreadsheets are doing a great job at data entry and presentation, but it's severely lacking as a computation tool.
However, very few tools allow you to have a seamless workflow using a proper language for computation while still using spreadsheets for the other aspects of the workflow. Once a spreadsheet has grown too large, someone decides to write a Python application to replicate it, which inevitably means that they have to build a webapp around it, which is probably going to be much less user-friendly since spreadsheets have had 40+ years of UI refinements, while the webapp was put together in an afternoon.
Spreadsheets intentionally hides errors in an attempt to do what the user thinks they want
Since this section could become very large if I listed all the examples where spreadsheets (and Excel in particular) fails to highlight problems, so here's a sample:
- Numbers tend to be floating point, which means that the moment a value is entered, there are precision issues. This is actually something that could be improved simply by supporting rational arithmetic, although I don't think there are any spreadsheets that do.
- If a number is entered as a string, it will look identical to an actual number (except that the alignment will be different by default). If you're lucky, you'll get an error later which will manifest as an #ERR field, and then you'll have a fun time finding the source of that issue.
- The system invites the user to write long single-line formulas in a language that is not suited for long, single-line formulas. It all ends up being crammed into a space that is nowhere near large enough to hold it. This makes editing the code an absolutely painful experience.
- Various errors, such as exceeding limits tend to be ignored instead of invalidating the entire document. This is something the UK government was made painfully aware of.
Using Kap as the default tool to solve various computation problems
The Kap programming language provides an alternative. Or rather, it's a work in progress so it's probably better to say it aims to provide an alternative. That said, I already use it for many tasks where a spreadsheet would be the go-to tool for many people.
Like most APL derivatives, the way you use it is often not by opening a test editor and start writing a program. Instead, you start it, you are presented with a prompt, and you start using it more as a calculator than a traditional programming language. This makes sense, since a large number of programming tasks that would take a lot of code in another language, can be solved in just a few characters in Kap.
When given a problem to solve, you typically look at the problem and then reach for the tool that is the most efficient way to solve that problem, given tool availability and the person's preferences. For example:
- You need to compute the square of the number 54321. Most people would likely reach for their desktop environment's calculator here.
- You want to compute the sum of some set of numbers. A natural choice here would be to use a spreadsheet.
- You're given a dictionary and asked to find the longest word which contains no more than 2 vowels. I suspect a lot of people would open an editor and start writing Python code to solve this.
For me, in all of these cases I open the Kap interactive tool, type a few characters and the problem is solved.
This is how I solved the third problem, by the way:
{ ⊃ ⍵ ⊇⍨ ↑⍒≢¨ ⍵ /⍨ 2≥ (⊂"aeiou") (+/∊⍨)¨ ⍵ } io:read "dict.txt"
At this point, comments suggesting the code is “unreadable” are bound to be thrown around, but once you learn a handful of symbols, this is in fact significantly more readable than the kinds of formulas I see on a regular basis in Excel. This is proof that one is not more difficult to learn than the other, and that the problem is one of marketing rather than technological.
In case anyone is curious about the above code, the next blog post will be a breakdown of the code above to explain how it works. Check this blog in the coming days.
The point I am trying to make by showing this example at all is to try to evoke the same kind of response as beginners get when faced with any code of any kind.
The Kap approach to spreadsheet integration
The array editor
The array editor is a graphical tool that allows the user to edit arrays in a way that is similar to a spreadsheet. The goal of this tool is to be as familiar as possible to someone who has used spreadsheets in the past.
The tool also allows you to type Kap expressions to manipulate values in the sheet. This is not like formulas, but rather direct modification of the data.
Funnily enough, this is actually something that is not obvious how to do in Excel. If you have a set of numbers, and you want to add some constant to them, the way you'd do that is to create a formula that adds the constant to a cell and put that into a different location, usually on the same sheet, and then paste that formula to all the cells in an area with the same size as the original data, and hope that you got the dimensions right.
The array editor allows you to easily edit the content of an array. Of course, you can always do it programmatically, but I think the prevalence of spreadsheets has shown that it's convenient to do it in a visual manner.
Data import
I have been participating in discussion forums and chat channels about APL and other array languages for a while now, and when a beginner starts learning one of these languages, after they have gone through all the tutorials, there comes a time where they want to work with real data and the first question is: How do I get my data into the system?
After all, that's what I asked myself and was frustrated when I realised that preparing the data was harder than actually working with it.
So, when I started working on my own implementation, one of the most important goals was that loading pre-existing data into the interpreter was one of the most important features.
What's the most common format of array-based data today? It's probably not very far fetched to assume it's Excel. For this reason, being able to read spreadsheets into Kap has been a priority, and at the time of writing the following methods are supported:
- Using functions such as
msoffice:read
function in the language itself. There are similar functions to load data in other formats, such as CSV and JSON. - Open the a spreadsheet file (Excel or LibreOffice) in the gui-based array editor.
- Copy&paste directly from the spreadsheet.
By the way, you can also paste any table data into the array editor. Here's an example of using Wikipedia content:
Presentation
This is the part of the Kap UI tool that needs to most work. The vision is to provide a user interface that makes it just as easy to create a presentation of results as it is in Excel. However, it would only be used for presentation purposes, and not for data entry nor computation.
While there is some actual code there that can be tested, it's still not usable except as a demonstration of what the current ideas in this direction are.
The goal of Kap is to be a complement to spreadsheets
When someone needs to compute something, they typically look at the problem and then decide to open a tool that helps them solve said problem. While Kap is a great tool to solve a variety of problems, I tend to believe that it particularly excels in the kinds of problems where the solution would otherwise involve a spreadsheet.
Kap is a fully open source project available under the MIT license. If you want to learn more about Kap, visit the Kap website. Feedback is always welcome.