Insights from programming language implementation



I've been making interpreters for over ten years now, and I'm finally starting to understand this stuff better. With ten of them on the site right this moment (not counting my book, or the tutorial that started it all) and two more on the way, it might just be a good time to write down a few things I've learned about this craft over the years.

For one: AST interpreters have abysmal performance. We're talking 100 times slower than the same algorithm implemented in the host language (before compiler optimizations, anyway). You're better off using direct interpretation, which is twice as fast and easier to understand.

For that matter, if you're going to implement a Lisp-like language, beware that the Tcl approach of making even control structures into ordinary procedures is a trap. Sure, in theory it's more elegant to not need any special forms in your eval function. But in practice, you end up writing lots of brittle, convoluted code to cover the various permutations, thus negating the advantage. Besides, what programmer in their right mind is going to redefine what if means in production code? Or at all, except for bragging rights?!

Life is full of special cases. Come down from your ivory tower.

That said, a little thinking goes a long way. Stack-based languages are infamous for their backward syntax. And they do have syntax, make no mistake: you can't afford to mix up the order of if, else and then! But does it have to be that way? Because even newer entries in the family, much higher level than Forth was forced to be, still sort of throw up their hands and go "whatever". And I recently proved it doesn't have to be the case, by taking inspiration from an unlikely source: Logo. Which, by the way, is full of those dreaded special cases, yet one of the friendliest ever created (unlike Perl, which is the opposite).

As a programmer, it's a good idea to know several languages. As a programming language designer? You'd better know a lot of them, big shot!

Last but not least, my recent research seems to suggest that orthogonality in programming language design might be overrated. And before you yell at me:

  • syntax and semantics have a messy relationship;
  • the most theoretically pure languages also tend to be the fussiest;
  • take a look at HTML.

I'll be around. Cheers.


Tags: programming, philosophy