Tuesday, April 15, 2014

Real Jedi make their own Forth

In which our protagonist takes matters into his own hands, needlessly quotes Chaucer, and gets his Star Wars mythology mixed up.

It turns out that there is a problem with the "standard" Forth for the 65c02, FIG Forth -- it is old, as in, really old. Trying to learn modern Forth with it is like trying to learn modern English by studying The Canterbury Tales: Not impossible, but probably not a good idea.

But now is tyme to yow for to telle
How that we baren us that ilke nyght,
Whan we were in that hostelrie alyght;
And after wol I telle of our viage
And al the remenaunt of oure pilgrimage.

Strangely, nobody seems to be working on a modern version of a lesser-used programming language for a 30-year-old CPU. So I decided to write my own. Hence the headline.

And thus was born Tali Forth for the 65c02, which has now reached a late ALPHA stage (and is free to use, but at your own risk). The basics are there except for some more complicated words such as DOES> and ?DO, and some legacy words such as, yes, WORD. IF/ELSE/THEN and DO/LOOP work, and so does POSTPONE, which is one of those words that is really, really powerful, but will give you nightmares trying to understand it the first time.

(About the name: I always liked it, and it seems we're not going to have more kids. If it sounds familiar, you're probably thinking of Tali'Zorah vas Normandy, a secondary character in the Mass Effect universe. For the record, and EA's lawyers, this software has absolutely nothing to do with the game or the companies that made it.)


Screenshot of Tali Forth running on the py65mon emulator. Multiline formatting still sucks, and though DOES> is listed, it is currently just an empty dictionary entry. I should probably define KEELAH somehow.

I decided on four criteria for this implementation (see here for more detail):

Simple. Duh -- if I'm writing it, it will have to be simple. For this reason, I chose to use Subroutine Threaded Code (STC), where each word that is not native machine code is accessed by a JSR. Brad Rodriguez of CamelForth fame explains the differences between the various models nicely.

Specific. For the 65c02 only, and as close to the metal as possible for speed. Well, and because I really like to program assembler.

Standardized. Roughly following the ANSI Forth standard. Actually, I used the ANS Forth 200x draft document. Ahead of the curve!

Speedy. When forced to choose between speed and size, go for speed (within reason). It would still be nice to stay under 8 kbyte for the core routines so they fit in the ROM chip that starts at E000.


Current high-level commands. These will probably be joined by ?DO and +LOOP once I figure out how to code them. Long-term, I'll consider replacing the individual strings with a long one so you can just add further commands to the end and EVALUATE them without all this hassle.

So. What has writing 230 kbyte of (grotesquely over-commented) source code by hand for 5.9 kbyte (and counting) of assembled machine code taught me?

First, it is amazing how simple Forth really is under the hood. Once the basic interpreter loop is up and running, you just code word after word, and compiling stuff is a breeze. No wonder "Forthwrights" (a term said to be invented by Al Kreever) look down on the complexity of other languages. Understanding POSTPONE is the problem, not coding it.

Second, I have given up trying to read anything but the most trivial Forth code without drawing stack diagrams. As true as the previous statement is, a lot of the power of Forth is in the implicit rules -- the way the stack works, for instance -- which takes extra effort to understand. This fits with the reports that the inventor of Forth, Chuck Moore, prepares stuff before coding:

His programming style is thoughtful. He thinks about the problem a lot and writes very little code. He thinks it through again and rewrites the code. He objects to letting too much code accumulate from historic reasons and likes to rewrite. He is the most productive programmer I have ever seen yet he has only written about 15K of code in the last fifteen years.

Which could lead us to a discussion about the definition of "productivity", and the time pressure modern-day programmers -- the article is from 1998 -- are under. But we'll skip that here.

In the end, the combination of power and complexity means the Jedi joke isn't quite right: It should probably read "Sith" in the title. I have a definite feeling of having joined the Dark Side, where you can do amazing stuff with very little effort, but pay a definite price. So far, no cookies.

In the next entry, we'll discuss design details of Tali Forth. Some of them are, well, curious.