Basic interpreters

The Basic idea

Growing the concept

When I wrote my first Basic implementation, the trickiest part was making it so that GOTO and friends would work correctly, surprisingly enough.

With my second one, FOR loops turned out to be a pain — another surprise — but dealing with user-defined functions, especially alongside built-ins, was the biggest mess. Quite simply, proper variable scoping (if only dynamic) is an absolute necessity if the body of a DEF FN is to see anything outside of its own arguments, and calls to builtins must be resolved through another mechanism, otherwise it takes an ugly kludge to tell them apart.

I thought for a while how to solve these problems without creating yet another block-structured Basic, because everyone is doing that, and they become kind of same-y after a while. Here's what I came up with.

Let's take it from the basics (no pun intended). What are the essential Basic statements? INPUT, for one, so we have some data to work on. Then PRINT, to get the results back out. LET to store intermediary calculations. DEF FN, to further reduce code duplication and improve clarity. REM is useful too, if we plan to run code from a file, not just interactively.

That's five statements, and it's already enough to make a calculator. But it's not yet a programming language. For that we need control structures, at the very least IF-ELSE and WHILE:

IF condition THEN statements... ELSE statements...
WHILE condition DO statements...

Yep... we'll be keeping them single-line to avoid complications. That way we can handle them like any other statement: type it, press Enter, it's handled and forgotten. Except for one detail: for these two to work, we need to pre-parse statements into an abstract syntax tree; direct interpretation no longer cuts it. But more about this later.

For now, there's a bit of a problem: if we handle each line of code as it appears, with no way to go back, then we can't have subroutines, now can we?

As a matter of fact, we can, with a trick borrowed from an unexpected place: Forth. Let's assume a statement of the form:

SUB name([args...])

This SUB statement creates a new subroutine, and enters compilation mode (just a flag in the interpreter, really). In this mode, any entered line isn't run but instead adds to the current subroutine's definition. This ends when an END SUB statement is encountered — which of course means it has to be handled specially, but that's no big deal. Note how this works even if we store the subroutine body as a plain list of strings. It's really that simple!

Last but not least, now that we have subroutines and functions, a LOAD statement would also be welcome, to allow for libraries. Well, there's also CALL: even if you make it so you don't need the keyword, calling subroutines is still a statement itself.

The punchline? We still have one statement fewer than Tiny Basic, perhaps the smallest known dialect, and ours is already more powerful!

Now, the problem with all that is twofold:

  1. Once the interpreter uses an intermediate representation, it's no longer quite trivial to implement; hopefully, the small number of simple statements will help.
  2. Since this particular dialect won't do loops very well, it will have to rely on matrix operations and other ways to perform implicit repetition.

Of course, point 2 will only make point 1 worse. But then again, it's not like the interpreter needs to be embeddable, and separation of concerns will also keep code simple.

But radical simplicity by itself isn't enough of a killer feature. Batch Basic (as I dubbed my new dialect) needs a reason to exist. And it occurred to me that with a minor change the INPUT statement can be made to read CSV files! Which would mean the language could be used for automatic processing of flat databases. Seemingly pointless in a world of ubiquitous spreadsheet software, but some things are simply better expressed as a computer program as opposed to a series of mouse clicks.

Moreover, the fundamentally line-oriented nature of the language reminds me of something... That's right, operating system shells! And Basic has been used for a shell in the past. Why not revive the tradition? It would only take a few built-in subroutines for navigating directories, copying and renaming files, and of course running other programs. Which in turn would even enable GUIs via dialog and similar utilities. Add facilities for handling signals and environment variables, and the whole thing starts having purpose.

Trimming it down

The more I thought of it, however, the more obvious it became that the mix of static typing and dynamic scoping made arrays a tricky proposition. The clever scheme I had in mind just wouldn't look workable from any angle. Matrix operations I wasn't even sure where to start implementing. As for CSV file handling, a better solution would involve a different language entirely — such as a modern xBase dialect.

While pondering these issues, I started work on a much simpler prototype, little more than an expression calculator named Bacal. My hope was to get a solid foundation and figure out at least some of the implementation while giving the rest more thought. But this prototype turned out to have a life of its own, and an ability to accomodate much of what was initially planned to go into its bigger brother, while not requiring any fancy tricks to get working.

Even better, relentlessly calling it an expression calculator made it easier to decide what should and shouldn't be included. For instance, once subroutines were out, deciding to make the LET keyword optional like in most dialects was natural. The choice to focus on features for reading data in over printing it out again was also important for limiting complexity. And of course functions remain easier to add than statements — though it's not nearly as easy to avoid adding statements as originally hoped.