Pathological Programming by String Rewriting

Nov 24 2006 Published by under pathological programming

Todays tidbit of torture is a simple little language called [Leszek][leszek], with an implementation available [here][leszek-impl]. Leszek is based on the idea of *iterative string rewriting*, which is actually a useful and valuable concept. Of course, Leszek takes it to an extreme of insanity which takes a perfectly good idea, and turns it into a horror. But thats what makes it fun!
In Lezsek, there are no variables; no loops; no state. A program is just a string with
a collection of embedded rewriting operators. The way that things work is, the interpreter looks
at the string. It finds all of the rewriting commands in the string, and executes them, concatenating the *results* of those commands. When it's done all of
the rewrites in the entire program, it takes the *resulting* program string, and executes *that*. It keeps going until there is no possibility of any more rewrites generating output, and then it halts.

So... The guts of the language is, obviously, the set of rewrite operators. So let's dive in, and take a look:
### Cut
"`C`", the "cut" command. This takes two parameters: the first is an escaped character *c*, and
the second is an integer *i*. The integer parameter is written as a sequence of digits followed by a period. What is does is search for the *i*th occurence of the character
*c*, and return the string from the *i*th occurence through the *i+1*th occurence.
For example, if you had a program: "`fooCb2.flobbiebletch`", the result of the "`Cb2.`" operation is the string between the second and third non-escaped "b" character, that is the string "ie".
### Length
"`L`" takes the same parameters as "`C`", but instead of returning the string, it returns the *length* of the string, in period-terminated integer form. So for the same example as the "`C`", "`fooLb2.foobiebletch`" the result would be "2.".
### Concatenation
The simple concatenation operator "`T`" takes two rewrite expressions, and returns the concatenation of them. So for example, in "`foobarfiebletchTCb1.Lb1.`", the "`T`" command evaluates "`Cb1.`", getting "arfie", and then evaluates "`Lb1.`", getting "5.". So the overall result is "arfie5."
There's also a group concatenation operator, "`G`", which takes an integer parameter *n*, followed by *n* other expressions; it evaluates all of the expressions, and then concatenations them. So "`T`" is equivalent to "`G2.`".
### Selection
"`A`" and "`D`" each take two expressions as parameters and evaluates them. "`A`" returns the result of the first one. "`D`" returns the result of the second one.
"`E`" is a conditional selection operator. It takes three expressions as parameters and evaluates them. If the first expression returns a nonempty string, it returns the second one; otherwise, it returns the third.
For use in "`E`", there's a null command, "`N`" which takes no parameters, and always evaluates to an empty string.
### Comparisons and Logic
"`=`" takes two expressions. If the two evaluate to the same string, then it returns "1"; otherwise, it returns an empty string.
"`&`", "`|`", and "`!`" are logical operators: and, or, and not.
### Input and Output
"`I`" reads a character from input, and returns it. "`M`" reads an integer from input, and returns it in the period-terminated form.
"`O`" takes an expression, evaluates it, and prints the result to the standard output.
### Arithmetic
Arithmetic operators take two integers as parameters, and return the result of the appropriate arithmetic operation: "`+`" (add), "`-`" (subtract), , "`*`" (multiply), "`V`" (integer divide) and "`%`" (modulo).
Leszek Programs
As usual, we start with hello world:
OC.1.Hello World!.
"`O`" outputs the result of its parameter, "`C.1.`". The "`C`" is a bit tricky: it copies
from the first non-escaped "." to the next "."; the first non-escaped period is the one that terminates the "`C`s" integer parameter. So the copy command returns the string "Hello World!",
the output prints it, and then the program terminates.
And an alternative version which is more interesting:
E=%C|1.NTC|.C%3.E=%C|1.NOD|Hello World!%|
Let's tease that apart, to make it more comprehensible. What it's going to try to do is print
out the characters between the "|"s, one at a time, and it's going to use the "`%`" after the `"Hello World!"` string to recognize when it's done.
* It starts is with the "E" conditional. We can rewrite that into a slightly more conventional way:
if (=%C|1. != "") then N
else TC|.C%3.
* If the contents of the string between the "`|`"s is just the "%" character, it's done. So the conditional will just return an empty string if it's done.
* If there are more characters between the "`|`"s than just "%", then it does the "`TC|.C%3.`". This is a simple concatenation of "`C|.`" and "`C%3.`".
* "`C|.`" copies from the 0th to the first "|". The 0th mention of any character is always treated as the character *before* the first character of the string. So it copies the full program
up to the character before the first "|": "`E=%C`"
* "`C%3.`" copies from the third to the fourth "%" character ("`C|1.NOD|Hello World!`"). That's quite a clever little bit, which basically finishes quineing the the beginning of the program.
* So the result of the first conditional is either nothing, or it reproduces the program for the next iteration.
* Next is another conditional, which is very much like the original. If there's nothing
between the "`|`"s then it returns nothing; otherwise, it goes on to the rest.
* "`OD|Hello World!%|`". This a clever bit. It's *really* "`OD|H`", followed by "`ello World!%`". "`OD|H`" prints the *second* of the two expressions that follow it. The first is the "`|`" that's quoting the string; the second is the first character of the string. So it prints "H". The rest of the program has no rewrite commands, so it just returns *itself* as a string.
* So in this first iteration, it's output the first character of the string, and *very nearly* quines itself. In fact, the almost-quine is missing one character: the *first* character of the quoted string.
From there, you should be able to see what's going on. It's going to repeat the process, only the second time, the first character of the quoted string is "e". And so on.
I'll close with another clever Leszek program, which you can work your way through by yourself. It's a fibonacci generator, which inputs a number *N*, and outputs the *N*th fibonacci number.
+C|2.C|4.|C|3.|1.|DG+2.L|4.|0.|C|5.C%1.+T1A.|.|OG14.Input number:

No responses yet

  • wintermute says:

    For example, if you had a program: "fooCb2.foobiebletch", the result of the "Cb2." operation is the string between the second and third non-escaped "b" character, that is the string "ie".

    Ummm.... Aren't they the first and second non-escaped b's?
    Or is something happening that I missed?

  • bmurray says:

    Oh god now all my days administering senmail configurations by hand are coming back in a flood.

  • Mark C. Chu-Carroll says:

    Yes, you're right. I messed up the example. I'll go fix it. Thanks!