Tiny documentation generators

Modern programming languages like Python or Nim, anything created this side of 1990, ship with documentation generators in the SDK, if not built right into the compiler. It's so useful, the absence is keenly felt when working with something older like C++ or Tcl. Third-party tools exist, but they either have gigantic dependencies (like Doxygen) or too-complicated usage (like Natural Docs). So what to do? For C++ there's not much of a choice, since parsing the language is non-trivial. But for Tcl? You can get surprisingly far with a command like grep ^proc make-sta.tcl | sort. No, really, look at this output:

proc format_tags tags {
proc get_datetime {d k} {
proc get_escaped {d k} { ::html::html_entities [dict get $d $k] }
proc get_rss_date {d k} {
proc load_entries {patterns parser quiet} {
proc load_templates dir {
proc make_slug data { string tolower [regsub {\W+} $data -] }
proc more_link {link content} {
proc page_list pages {
proc parse_meta data {

I picked a sample from the middle so you can see the one-liners that were captured in their entirety: a nice side-effect. But it's all fairly useful.

If a literal one-liner can do that, how far can I get with ten lines of code? Let's try and see:

#!/usr/bin/env tclsh

set re_decl {^(set|proc|namespace eval|namespace import|class create)\s+(\S+)}

while {![eof stdin]} {
    set line [gets stdin]
    if {[regexp $re_decl $line all kind name]} {
        set decl($name) $line
        lappend index($kind) $name
    }
}

foreach i [array names index] {
    puts ""
    puts "## $i"
    puts ""
    foreach j [lsort $index($i)] {
        puts "\t$decl($j)"
    }
}

Okay, make that twenty, but on the plus side it captures all kinds of declarations and groups them into sections. It's also easily confused, but still manages to be useful enough most of the time. One thing is missing however: comments. It would be too complicated to capture those placed before a definition, like in most such systems. But what about putting them after, like this?

proc page_list pages {
    # Generate a list of label-link pairs for the given pages.  
}

Turns out it only takes another ten lines of code:

#!/usr/bin/env tclsh

set re_decl {^(set|proc|namespace eval|namespace import|class create)\s+(\S+)}

while {![eof stdin]} {
    set line [gets stdin]
    if {[regexp $re_decl $line all kind name]} {
        set decl($name) $line
        set comments($name) [list]
        lappend index($kind) $name
        while {![eof stdin]} {
            set line2 [gets stdin]
            if {[regexp {^\s*#} $line2]} {
                lappend comments($name) $line2
            } else {
                break
            }
        }
    }
}

foreach i [array names index] {
    puts "\n## $i"
    foreach j [lsort $index($i)] {
        puts "\n\t$decl($j)"
        foreach k $comments($j) {
            puts "\t$k"
        }
    }
}

This being Tcl, we have no way to mark documentation comments with a double hash sign for example, so just grab everything. But in practice that's good enough.

The rest is a detail, like the ability to handle multiple files at once, or generate other documentation formats like web pages. Even then, this unnamed script wouldn't be much more complicated. And that, if anything, should be a lesson. What exactly are we doing with megabytes of code and hundreds of megabytes of dependencies?