Syntax Highlighting and text editors



As I struggle to complete a web port of my new Babble programming language, I set some time aside to have a little fun.

Ever since the Turbo Pascal IDE introduced countless bedroom coders to syntax highlighting, it's been hard to imagine programming without this little feature. Modern text editors often come with support for dozens or hundreds of languages. It helps the eye a lot, and makes the screen come to life. In fact I've argued in the past that the now-common C syntax was at first a form of syntax highlighting, making it easy to visually pick up variables from operators at a glance.

It's not the same without colors however, and this is especially noticeable when using a new or lesser-known language, that text editors display as-is. Luckily in many editors it's easy enough to add support for your favorite, and it makes a big difference. My little toy now feels real! But support for doing so differs considerably between editors, with equally varied results, so this ended up being a journey in itself.

Turns out I have no less than five programmer's editors installed (not counting a couple of Basic environments): Geany and Mousepad on the desktop, then Nano, Micro and mcedit in the console. Guess which set was easier to work with.

Geany is one of the most popular out there: an IDE that's as powerful as it's light. Unfortunately programming language support is literally hardcoded; in theory you can add new languages based on some existing syntax except with different keywords and such, but in practice... well, I think I got it to work once. Maybe.

Mousepad is my new favorite, an editor that has all the basics and nothing else. Even better, it uses the same syntax definition files as GEdit, an XML-based format. They're easy enough to get by and install, but trying to make my own was a failure: it was recognized and loaded correctly, except nothing worked, with no indication of what might be wrong. So much for that.

Nano was the next one I tried, even though I don't normally use it (all those weird keybindings). Syntax highlighting is even part of its normal configuration: simply store your definition files someplace safe, and add a line to your nanorc telling it to load them all, like this: include "~/.config/nano/*.nanorc".

As for the definition format, it's simple enough and well-documented, based on a kind of regular expressions. There are no color schemes or any established way to use colors, so I based mine on Nano's own config file syntax highlighting.

Micro on the other hand is an editor I actually use, on a tilde server. It's good, and also the easiest of the bunch to extend: just drop an YAML file in ~/.config/micro/syntax/ (you'll probably have to create the folder).

Yep, Micro also uses YAML, a well-known format. Otherwise it has a regular expression system similar to that of Nano. It also has color schemes, so you need to classify tokens in some of the predefined categories. Certain effects are easier to pull off in one or the other editor, so it all balances out.

Last, mcedit is the text editor that ships with Midnight Commander. It's much more capable than it seems at first sight, and I used a similar one in the past with good results, so why not. It's just kind of quirky: you have to copy the whole Syntax file from /usr/share/mc/syntax/ into ~/.config/mc/mcedit/ and add to it using existing entries as examples. Your own files have to be given with full path, too. But hey, it works.

Sure enough, mcedit uses a syntax definition format very similar to the other two, except tokens are matched in a primitive way. That makes it hard if not impossible to highlight numbers, but conversely it's much easier with operators and delimiters. It also maps tokens to colors directly, like Nano, but has strong conventions for color use, like Micro, so it's easier to pick one.

Speaking of: if I'm going to keep doing this, it's probably a good idea to pick a text editor and stick with it. And that's going to be Micro, for the reason mentioned above. But keep the others in mind.


Tags: Linux, programming, links