This is the third in a 5-part tutorial on how to implement a programming language. It is intended for people with some programming experience, who want to know how their compiler, interpreter or virtual machine works. Hint: it's not magic.
In Part 2 you've learned how to give the Scratch language features such as variables, constants, strings and comments. In this installment, I'll focus on what makes a programming language... programmable: user-defined words.
This is begging for a remark. I keep calling Scratch subroutines "words". It seems appropriate — since they live in a dictionary — but really, they are no different from functions or procedures in other languages. Just call them what you like.
What exactly are we trying to achieve? In keeping with the spirit of the language so far, a word definition might look like this:
DEF HYPOT DUP * SWAP DUP * + SQRT END 3 4 HYPOT // Should leave 5 on the stack
It does start like a variable definition, doesn't it? Indeed, it is the same basic idea: create a hardcoded word that first reads an identifier. Then it begins collecting (as opposed to executing) the words that make up the definition. In the end — appropriately signaled by the word END — it stores everything in the dictionary, in a form that will allow the new word to be executed when encountered later.
Quite a mouthfull, eh? But it's actually straightforward.
function Scratch () { ... this.lookup = function (word) { return dictionary[word]; }; ... } var CompilingWords = { "DEF": function (terp) { var new_word = terp.lexer.nextWord(); if (new_word == null) throw "Unexpected end of input"; var code = []; do { var word = terp.lexer.nextWord(); if (word == null) throw "Unexpected end of input"; word = word.toUpperCase(); if (word == "END") { var finished = true; continue; } var num_val = parseFloat(word); var found_word = terp.lookup(word); if (found_word) { code.push(found_word); } else if (!isNaN(num_val)) { code.push(num_val); } else { throw "Unknown word"; } } while (!finished); terp.define(new_word, function (terp) { var code_pointer = 0; while (code_pointer < code.length) { var word = code[code_pointer]; if (typeof (word) == 'function') { word(terp); } else { terp.stack.push(word); } code_pointer ++; } }); } };
Cool. We now have a word that defines other words!
And we barely touched the interpreter proper. But wait...
Not everything works inside a DEF
. Specifically, none of the
things we've learned about in Part 2. That's because string-
and variable-making words need to read ahead in the input
stream as soon as they are encountered. But DEF
has no way of knowing that.
Well, let's tell the poor function which words are special.
Feel free to do that in any way that works for you. This being
Javascript, I chose to give each word-implementing function a
custom property called immediate
. Where absent, it
evaluates to undefined
, therefore false
.
VariableWords["VAR"].immediate = true; ConstantWords["CONST"].immediate = true; StringWords["\""].immediate = true; CommentWords["/*"].immediate = true; CommentWords["("].immediate = true; CommentWords["//"].immediate = true;
With a small change, DEF
is ready to take advantage
of this new data.
... if (found_word) { if (found_word.immediate) { found_word(terp); } else { code.push(found_word); } } else if (!isNaN(num_val)) { this.stack.push(num_val); } else { throw "Unknown word"; } ...
Unfortunately, it still isn't enough to make strings work.
That's because "
works by depositing on the stack
the data it collected. This needs to happen at runtime, and
until now that was the only time. Except with DEF
we've introduced a brand new step: compilation (not in the sense of
creating an .exe file, but it's still compilation
*). That means strings
will be pushed onto the stack at the wrong time.
This is where our naive implementation breaks down. We could
check after each immediate word if it modified the stack, and
collect its result from there. But this would add further
complication to code that is already quite long, and redundant to
boot. (Notice how DEF
duplicates some code from
Scratch.run()
? Well, we can't reuse it cleanly the way
it is now.) So we'll do the right thing: acknowledge this is a
major change and give the interpreter an explicit notion of the
compilation step.
function Scratch () { var dictionary = {}; var data_stack = []; var compile_buffer = []; this.stack = data_stack; // Was: this.stack = []; this.immediate = false; ... this.compile = function (word) { var word = word.toUpperCase(); var num_val = parseFloat(word); if (dictionary[word]) { this.immediate = dictionary[word].immediate; return dictionary[word]; } else if (!isNaN(num_val)) { return num_val; } else { throw "Unknown word"; } }; ... this.startCompiling = function () { this.stack = compile_buffer; }; this.stopCompiling = function () { this.stack = data_stack; }; this.isCompiling = function () { return this.stack == compile_buffer; }; }
What we've done here is ensure that whenever the interpreter
enters compilation mode, compile_buffer
(which will
play the role of the code
variable in DEF
)
replaces the data stack transparently; now words such as
"
can push data at any time, and it will go to
the right place. What about words that aren't immediate, then?
Those aren't supposed to be executed while we're compiling. Let's
modify run()
accordingly.
function Scratch() { ... this.run = function (text) { this.lexer = new ScratchLexer(text); var word; while (word = this.lexer.nextWord()) { word = this.compile(word); if (this.immediate) { this.interpret(word); this.immediate = false; } else if (this.isCompiling()) { this.stack.push(word); } else { this.interpret(word); } } }; this.interpret = function (word) { if (typeof (word) == 'function') { word(this); } else { this.stack.push(word); } }; ... }
Um, what about DEF
, now? You'd be surprised.
function makeWord(code) { return function (terp) { var code_pointer = 0; while (code_pointer < code.length) { terp.interpret(code[code_pointer]); code_pointer ++; } }; } var CompilingWords = { "DEF": function (terp) { var new_word = terp.lexer.nextWord(); if (new_word == null) throw "Unexpected end of input"; terp.latest = new_word; terp.startCompiling(); }, "END": function (terp) { var new_code = terp.stack.slice(0); // Clone compile_buffer. terp.stack.length = 0; // Clear compile_buffer. terp.define(terp.latest, makeWord(new_code)); terp.stopCompiling(); } }; CompilingWords["DEF"].immediate = true; CompilingWords["END"].immediate = true;
Not only END
becomes an ordinary word (one less
special case to worry about), but both it and the new
DEF
are absolutely trivial.
There is only one little problem left, namely that Scratch doesn't support recursive definitions. Granted, they would be useless in the absence of control structures. Still, they would be nice to have. But that would require going back and change a good deal of the code I've just written. I'll give you a hint: you need to be able to put a new word in the dictionary first, and set its code afterwards.
Wait, did I just say control structures? That's exactly what we're going to do in Part 4.
*
To compile is a pretentious-sounding word, but it simply
means "To put together; to assemble; to make by gathering things
from various sources." —
Wiktionary.
In computer science, on the other hand, it is a synonym for
"to translate". That's exactly what DEF
is doing:
it turns Scratch source code from plain text into Javascript
values: numbers, strings and functions.