This is the second in a 5-part tutorial on how to implement a programming language. It is intended for people with some programming experience, who want to know how their compiler, interpreter or virtual machine works. Hint: it's not magic.
In Part 1 you've learned to make yourself a simple programming language based around two data structures: a stack and a dictionary. Scratch — as I named it — is useful, but needs a few more things. In this installment, I will show you how to do the following:
Interestingly enough, all these features depend on exactly one modification to the interpreter.
How would you go about declaring a variable in Scratch?
The interpreter doesn't yet know its name, so you can't just
say X VAR: it would choke on the X
in the first place. So it has to be VAR X
(like in Javascript... fitting), except
VAR can't be a word, because it would have to
read ahead in the input stream, and words can't even see
the lexer. Or can they? Let's look at the run() method again.
this.run = function (text) {
var lexer = new ScratchLexer(text);
var word;
var num_val;
while (word = lexer.nextWord()) {
word = word.toUpperCase();
num_val = parseFloat(word);
if (dictionary[word]) {
dictionary[word](this);
} else if (!isNaN(num_val)) {
this.stack.push(num_val);
} else {
throw "Unknown word";
}
}
};
As you can see, my first instinct was to make
lexer local, as per the principle
of least privilege. But this is really hampering me
now. I could make variable declarations
a special case (another else if...) and go
downhill from there. Or I could simply
change two lines of code.
this.lexer = new ScratchLexer(text);
...
while (word = this.lexer.nextWord()) {
Now I'm free to make VAR an ordinary word.
Openness for the win!
function makeVariable(terp) {
var me = {value: 0};
return function () { terp.stack.push(me); };
}
var VariableWords = {
// Read next word from input and make it a variable.
"VAR": function (terp) {
var var_name = terp.lexer.nextWord();
if (var_name == null) throw "Unexpected end of input";
terp.define(var_name, makeVariable(terp));
},
};
// This goes into the definition of Scratch()
this.define = function (word, code) {
dictionary[word.toUpperCase()] = code;
}
OK, that was a little complicated: each new variable is hidden inside a word that, when executed, puts a reference to it on the stack. And once we have a reference... well, you guessed it.
var VariableWords = {
...
// Store value of 2OS into variable given by TOS.
"STORE": function (terp) {
if (terp.stack.length < 2) throw "Not enough items on stack";
var reference = terp.stack.pop();
var new_value = terp.stack.pop();
reference.value = new_value;
},
// Replace reference to variable on TOS with its value.
"FETCH": function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var reference = terp.stack.pop();
terp.push(reference.value);
}
}
Come on, check that it works:
var a 10 a store a fetch print
As for constants (I promised, didn't I), it should be easy now. All they need to do is push a fixed value on the stack.
function makeConstant(value, terp) {
return function () { terp.stack.push(value); }
}
var ConstantWords = {
// Read next word from input and make it a constant with TOS as value.
"CONST": function (terp) {
if (terp.stack.length < 1) throw "Not enough items on stack";
var const_name = terp.lexer.nextWord();
if (const_name == null) throw "Unexpected end of input";
var const_value = terp.stack.pop();
terp.define(const_name, makeConstant(const_value, terp));
}
}
Except right now they aren't really constants (they can be redefined), so they could just as well serve as the variables of Scratch. But this is only because we're coding in Javascript here; other languages may not be so flexible.
Now it should be clear how to make strings: define a word (a double quote should do nicely) that reads ahead until it meets an end-of-string marker. A second double quote is best IMO. Then we push the collected string onto the stack.
var StringWords = {
"\"": function (terp) {
var collector = "";
var done = false;
do {
var next_word = terp.lexer.nextWord();
if (next_word == null) throw "Unexpected end of input";
if (next_word.substr(-1, 1) == "\"") {
next_word = next_word.slice(0, -1);
collector += next_word;
done = true;
} else {
collector += next_word;
collector += " ";
}
} while (!done);
terp.stack.push(collector);
}
}
Now you can finally say:
" Hello, world" print
And yes, that's a space after the first double quote. It's a word like any other, remember?
Comments are similar, except they don't need to keep the content.
var CommentWords = {
"/*": function (terp) {
do {
var next_word = terp.lexer.nextWord();
if (next_word == null) throw "Unexpected end of input";
} while (next_word.substr(-2, 2) != "*/");
}
};
It is interesting to note that Forth (my source of inspiration, as noted in Part 1) uses parantheses to delimit comments. Did you notice they're not needed for anything else? Forth also has a word called "*/". You can still implement it.
We're almost done here, with one little exception: what if you want to preserve whitespace inside strings? Or — shudder — implement C++ style comments that run to the end of line? As before, both problems have the same solution, and it lies in the lexer.
Trouble is, I've written my little lexer to read word by word. That was in order to keep things as simple as possible. The bad news is, I have to rewrite it. The good news is, I don't need to touch anything else.
// Trying to avoid regular expressions here.
function isWhitespace(char) {
return (char == " ")
|| (char == "\t")
|| (char == "\r")
|| (char == "\n")
|| (char == "\v");
}
function ScratchLexer(text) {
var position = 0; // Beginning of TEXT.
this.nextWord = function() {
if (position >= text.length) return null;
while (isWhitespace(text.charAt(position))) {
position ++;
if (position >= text.length) return null;
}
var new_pos = position;
while (!isWhitespace(text.charAt(new_pos))) {
new_pos ++;
if (new_pos >= text.length) break;
}
var collector = text.substring(position, new_pos);
new_pos ++; position = new_pos; // Skip the delimiter.
return collector;
};
this.nextCharsUpTo = function (char) {
if (position >= text.length) return null;
var new_pos = position;
while (text.charAt(new_pos) != char) {
new_pos ++;
if (new_pos >= text.length)
throw "Unexpected end of input";
}
var collector = text.substring(position, new_pos);
new_pos ++; position = new_pos; // Skip the delimiter.
return collector;
};
}
Whoa, looks like we've gained a lot of complexity all of a sudden. But just look at our string-making word now.
var StringWords = {
"\"": function (terp) {
terp.stack.push(terp.lexer.nextCharsUpTo("\""));
}
};
Isn't that precious? You can even keep the old version around, with a different name; it will work just like before.
That concludes the second installment. In Part 3 you'll (finally) learn how to implement defining new words in Scratch itself.
P.S. Hiding variables in a function, to be accessed indirectly after that is called "making a closure". It's an advanced, and very powerful programming concept.
