, 2009-05-17 (DRAFT)

Make Your Own Programming Language
Part 3

This is the third in a 5-part tutorial on how to implement a programming language. It is intended for people with some programming experience, who want to know how their compiler, interpreter or virtual machine works. Hint: it's not magic.

In Part 2 you've learned how to give the Scratch language features such as variables, constants, strings and comments. In this installment, I'll focus on what makes a programming language... programmable: user-defined words.

This is begging for a remark. I keep calling Scratch subroutines "words". It seems appropriate — since they live in a dictionary — but really, they are no different from functions or procedures in other languages. Just call them what you like.

Syntax of user-defined words

What exactly are we trying to achieve? In keeping with the spirit of the language so far, a word definition might look like this:

DEF HYPOT  DUP * SWAP DUP * + SQRT  END

3 4 HYPOT // Should leave 5 on the stack

It does start like a variable definition, doesn't it? Indeed, it is the same basic idea: create a hardcoded word that first reads an identifier. Then it begins collecting (as opposed to executing) the words that make up the definition. In the end — appropriately signaled by the word END — it stores everything in the dictionary, in a form that will allow the new word to be executed when encountered later.

Quite a mouthfull, eh? But it's actually straightforward.

A naive implementation

function Scratch () {

        ...
        
        this.lookup = function (word) { return dictionary[word]; };
        
        ...
        
}

var CompilingWords = {
	"DEF": function (terp) {
		var new_word = terp.lexer.nextWord();
		if (new_word == null) throw "Unexpected end of input";

		var code = [];
		do {
			var word = terp.lexer.nextWord();
			if (word == null) throw "Unexpected end of input";
			
			word = word.toUpperCase();
			if (word == "END") { var finished = true; continue; }
			
			var num_val = parseFloat(word);
			var found_word = terp.lookup(word);
			if (found_word) {
				code.push(found_word);
			} else if (!isNaN(num_val)) {
				code.push(num_val);
			} else {
				throw "Unknown word";
			}
		} while (!finished);
		
		terp.define(new_word, function (terp) {
			var code_pointer = 0;
			while (code_pointer < code.length) {
				var word = code[code_pointer];
				if (typeof (word) == 'function') {
					word(terp);
				} else {
					terp.stack.push(word);
				}
				code_pointer ++;
			}
		});
	}
};
  

Cool. We now have a word that defines other words! And we barely touched the interpreter proper. But wait... Not everything works inside a DEF. Specifically, none of the things we've learned about in Part 2. That's because string- and variable-making words need to read ahead in the input stream as soon as they are encountered. But DEF has no way of knowing that.

Well, let's tell the poor function which words are special. Feel free to do that in any way that works for you. This being Javascript, I chose to give each word-implementing function a custom property called immediate. Where absent, it evaluates to undefined, therefore false.

        VariableWords["VAR"].immediate = true;
        ConstantWords["CONST"].immediate = true;
        StringWords["\""].immediate = true;

        CommentWords["/*"].immediate = true;
        CommentWords["("].immediate = true;
        CommentWords["//"].immediate = true;

With a small change, DEF is ready to take advantage of this new data.

        ...

        if (found_word) {
                if (found_word.immediate) {
                        found_word(terp);
                } else {
                        code.push(found_word);
                }
        } else if (!isNaN(num_val)) {
                this.stack.push(num_val);
        } else {
                throw "Unknown word";
        }
        
        ...

Unfortunately, it still isn't enough to make strings work. That's because " works by depositing on the stack the data it collected. This needs to happen at runtime, and until now that was the only time. Except with DEF we've introduced a brand new step: compilation (not in the sense of creating an .exe file, but it's still compilation *). That means strings will be pushed onto the stack at the wrong time.

This is where our naive implementation breaks down. We could check after each immediate word if it modified the stack, and collect its result from there. But this would add further complication to code that is already quite long, and redundant to boot. (Notice how DEF duplicates some code from Scratch.run()? Well, we can't reuse it cleanly the way it is now.) So we'll do the right thing: acknowledge this is a major change and give the interpreter an explicit notion of the compilation step.

function Scratch () {
	var dictionary = {};
	var data_stack = [];
	var compile_buffer = [];

	this.stack = data_stack; // Was: this.stack = [];
	this.immediate = false;

        ...

	this.compile = function (word) {
		var word = word.toUpperCase();
		var num_val = parseFloat(word);
		if (dictionary[word]) {
			this.immediate = dictionary[word].immediate;
			return dictionary[word];
		} else if (!isNaN(num_val)) {
			return num_val;
		} else {
			throw "Unknown word";
		}
	};

        ...

	this.startCompiling = function () {
		this.stack = compile_buffer;
	};

	this.stopCompiling = function () {
		this.stack = data_stack;
	};
        
	this.isCompiling = function () {
		return this.stack == compile_buffer;
	};
}

What we've done here is ensure that whenever the interpreter enters compilation mode, compile_buffer (which will play the role of the code variable in DEF) replaces the data stack transparently; now words such as " can push data at any time, and it will go to the right place. What about words that aren't immediate, then? Those aren't supposed to be executed while we're compiling. Let's modify run() accordingly.

function Scratch() {
        
        ...

	this.run = function (text) {
		this.lexer = new ScratchLexer(text);
		var word;

		while (word = this.lexer.nextWord()) {
			word = this.compile(word);
			if (this.immediate) {
				this.interpret(word);
				this.immediate = false;
			} else if (this.isCompiling()) {
				this.stack.push(word);
			} else {
				this.interpret(word);
			}
		}
	};

	this.interpret = function (word) {
		if (typeof (word) == 'function') {
			word(this);
		} else {
			this.stack.push(word);
		}
	};
        
        ...
        
}

Um, what about DEF, now? You'd be surprised.


function makeWord(code) {
	return function (terp) {
		var code_pointer = 0;
		while (code_pointer < code.length) {
			terp.interpret(code[code_pointer]);
			code_pointer ++;
		}
	};
}

var CompilingWords = {
	"DEF": function (terp) {
		var new_word = terp.lexer.nextWord();
		if (new_word == null) throw "Unexpected end of input";

		terp.latest = new_word;
		terp.startCompiling();
	},

	"END": function (terp) {
		var new_code = terp.stack.slice(0); // Clone compile_buffer.
		terp.stack.length = 0; // Clear compile_buffer.
		terp.define(terp.latest, makeWord(new_code));
		terp.stopCompiling();
	}
};
CompilingWords["DEF"].immediate = true;
CompilingWords["END"].immediate = true;

Not only END becomes an ordinary word (one less special case to worry about), but both it and the new DEF are absolutely trivial.

There is only one little problem left, namely that Scratch doesn't support recursive definitions. Granted, they would be useless in the absence of control structures. Still, they would be nice to have. But that would require going back and change a good deal of the code I've just written. I'll give you a hint: you need to be able to put a new word in the dictionary first, and set its code afterwards.

Wait, did I just say control structures? That's exactly what we're going to do in Part 4.

Try Scratch
Stack content:

* To compile is a pretentious-sounding word, but it simply means "To put together; to assemble; to make by gathering things from various sources." — Wiktionary. In computer science, on the other hand, it is a synonym for "to translate". That's exactly what DEF is doing: it turns Scratch source code from plain text into Javascript values: numbers, strings and functions.

Creative Commons License
Make Your Own Programming Language by Felix Pleşoianu is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.