This is the fourth in a 5-part tutorial on how to implement a programming language. It is intended for people with some programming experience, who want to know how their compiler, interpreter or virtual machine works. Hint: it's not magic.
In Part 3 you've learned how to extend Scratch from within, with user-defined words. The only thing left (and I'm sure you're wondering why I waited so long) are control structures.
The thing with stack-based languages is, they give you so much choice on how to achieve a particular result. We've seen two distinct ways of implementing variables, two implementations for user-defined words (and we'll discover a third in this installment), and we've barely scratched the surface of what is possible. Similarly, two main styles of control structures are possible.
First, there is the Forth way:
( if-then-else statement ) condition IF do when true ELSE do when false THEN ( while loop ) BEGIN condition WHILE do various things REPEAT
Leaving aside the backwardedness of the syntax (which is more than can be blamed on the reverse polish notation), implementing these words would involve manipulating the compilation buffer (which isn't even available outside of a word definition), with the help of a second stack and some really low-level words. But before we turn this tutorial into a full-blown compiler course, let's see how modern stack languages do it.
condition { do when true } { do when false } ifelse % Postscript
[ condition ] [ do when true ] [ do when false ] if ( Factor )
[condition] [do when true] [do when false] ifte ( Joy )
As you can see, they are all variations on a common theme, which
can be summed up to "gather several words in a list, then
interpret them as needed". Which, in turn, sounds a lot like
what DEF does. With one little exception: lists must be
able to contain other lists.
var ListWords = {
"[": function (terp) {
var list = [];
var old_stack = terp.stack;
terp.stack = list;
do {
var next_word = terp.lexer.nextWord();
if (next_word == null) throw "Unexpected end of input";
if (next_word == "]") break;
next_word = terp.compile(next_word);
if (next_word.immediate)
terp.interpret(next_word);
else
terp.stack.push(next_word);
} while (true);
terp.stack = old_stack;
terp.stack.push(list);
}
};
ListWords["["].immediate = true;
Note how [ uses what is effectively a compilation
buffer, like the one built into the interpreter, except it's a
private one. Using Javascript's own stack saves us from the
trouble of dealing with yet another data structure ourselves,
which is just fine for our purposes.
(With apologies to Andrew Plotkin.)
Now you can enter things like [ 1 2 3 ] and
[ 1 2 3 [ 4 5 6 ] ] and check that the interpreter
does the right thing. Except, of course, you'll have to use the
browser's address bar for that, as there are no words in Scratch to
actually manipulate lists. Fortunately, that's easily fixed.
ListWords["LENGTH"] = function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var temp = terp.stack.pop();
terp.stack.push(temp.length);
};
ListWords["ITEM"] = function (terp) {
if (terp.stack.length < 2) throw "Not enough items on stack";
var key = terp.stack.pop();
var obj = terp.stack.pop();
if (typeof obj == "object") {
terp.stack.push(obj[key]);
} else {
throw "Object expected";
}
}
Now you can easily check the length of — and extract items from — a list. Feel free to develop the concept.
[ 1 2 3 ] length // Yields 3 [ 1 2 [ 3 4 5 ] ] 2 item // Yields 3, 4, 5 [ 1 2 [ 3 4 5 ] ] 2 item 2 item // Yields 5 3 length // Yields undefined
(By the way, having a simplistic interpreter eases understanding,
but there's a price: I always forget to type a space before and after
square brackets. You may want to use begin/end
keywords instead.)
Now we can finally do what we've set out to achieve in this installment.
If you don't see the connection yet, try entering the following
in the interpreter: [ 1 " hello" print ]. On my machine
it yields the following: 1,hello,function (terp) { if (terp.stack.length < 1) throw "Not enough items on stack"; var tos = terp.stack.pop(); alert(tos); }.
The dictionary word PRINT has been added to the list
like an ordinary data item! Indeed, Scratch is what they call a
homoiconic
language. This helps a lot, because if code is data, then data
is code, and we can run a list just as if it was a word.
var ControlWords = {
"RUN": function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var temp = terp.stack.pop();
if (temp.constructor != Array) throw "List expected";
terp.interpret(makeWord(temp));
}
};
This one's straight out of Logo. It looks like a good fit, so let's continue in the same vein.
[ do some stuff ] 5 times condition [ do some stuff ] iftrue condition [ do different stuff ] iffalse condition [ do some things ] [ do other things ] ifelse [ condition ] [ repeat actions ] while
Look at the
source code for this installment
to see how they're implemented. While you're there, note how
complex is WHILE. It's a consequence of trying to
shoehorn in control structures from a different kind of language.
But a better suited loop will require additional language support.
The astute reader might have noticed I didn't bother to mention
comparison and logical operations. They are, of course pretty much
required for control structures to function, but they are also
trivially implemented. Something we don't have — and can't
have, with the current implementation — is a way to perform
jumps. That's because the variable code_pointer is
local to makeWord(). Can you think of a better place?
function makeWord(code) {
return function (terp) {
var old_pointer = terp.code_pointer;
terp.code_pointer = 0;
while (terp.code_pointer < code.length) {
terp.interpret(code[terp.code_pointer]);
terp.code_pointer ++;
}
terp.code_pointer = old_pointer;
};
}
Now the code pointer lives in the interpreter, it's trivial to
implement the traditional continue, as a word that
sets it to Infinity. Throw in an extra flag, and you
can have break as well. But wait, there's a trick to
them. You can't just call them as you would in Javascript
(condition [ break ] iftrue) because it would just
leave the innermost list. Not exactly useful. What you need is a
conditional version of each, that checks the value on top of the
stack first. Let's call them ?break and ?continue.
Add a looping construct that minds the aforementioned flag and
you're good to go.
var ControlWords = {
...
"?CONTINUE": function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var cond = terp.stack.pop();
if (cond) terp.code_pointer = Infinity;
},
"?BREAK": function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var cond = terp.stack.pop();
if (cond) {
terp.code_pointer = Infinity;
terp.break_state = true;
}
},
"LOOP": function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var code = terp.stack.pop();
if (code.constructor != Array) throw "List expected";
var code_word = makeWord(code);
var old_break_state = terp.break_state;
terp.break_state = false;
do { code_word(terp); } while (!terp.break_state);
terp.break_state = old_break_state;
}
}
With these you can simulate any conditional loop. For example:
1 [ dup 3 > ?break dup print 1 + ] loop
A little cryptic, perhaps, but it works. Feel free to choose betters-sounding words.
There are many more possibilities, but I'm running out of easily explained techniques. Let's call it a day. In Part 5 we're going to try and draw some conclusions, as well as outline future directions.
