## A Neat Trick with Partial Evalutors

This is something I was talking about with a coworker today, and I couldn't quickly find a friendly writeup of it online, so I decided to write my own. (Yes, we do have a lot of people who enjoy conversations like this at foursquare, which is one of the many reasons that it's such a great place to work. And we are hiring engineers! Feel free to get in touch with me if you're interested, or just check our website!)

Anyway - what we were discussing was something called partial evaluation (PE). The idea of PE is almost like an eager form of currying. Basically, you have a two parameter function, f(x, y). You know that you're going to invoke it a bunch of times with the same value of x, but different values of y. So what you want to do is create a new function, fx(y), which evaluates as much as possible of f using the known parameter.

It's often clearer to see it in programmatic form. Since these days, I pretty much live Scala, I'll use Scala syntax. We can abstractly represent a program as an object which can be run with a list of arguments, and which returns some value as its result.

trait Program {
def run(inputs: List[String]): String
}


If that's a program, then a partial evaluator would be:

object PartialEvaluator {
def specialize(prog: Program, knownInputs: List[String]): Program = {
...
}
}


What it does is take and program and a partial input, and returns a new program, which is the original program, specialized for the partial inputs supplied to the partial evaluator. So, for example, imagine that you have a program like:

object Silly extends Program {
def run(inputs: List[String]): String = {
val x: Int = augmentString(inputs[0]).toInt
val y: Int = augmentString(inputs[0]).toInt
if (x % 2 == 0) {
(y * y).toString
} else {
((y+1)*(y+1)).toString
}
}
}


This is obviously a stupid program, but it's good for an example, because it's so simple. If we're going to run Silly a bunch of times with 1 as the first parameter, but with different second parameters, then we could generate a specialized version of Silly:

   val sillySpecialOne = PartialEvaluator.specialize(Silly, List("1"))


Here's where the interesting part comes, where it's really different from
currying. A partial evaluator evaluates everything that it
possibly can, given the inputs that it's supplied. So the value produced by the specializer would be:

object Silly_1 extends Program {
def run(inputs: List[String]): String = {
val y: Int = augmentString(inputs[0]).toInt
((y+1)*(y+1)).toString
}
}


This can be a really useful thing to do. It can turn in to a huge
optimization. (Which is, of course, why people are interested in it.) In compiler terms, it's sort of like an uber-advanced form of constant propagation.

But the cool thing that we were talking about is going a bit meta. Suppose that you run a partial evaluator on a programming language interpreter?

object MyInterpreter extends Program {
def run(inputs: List[String]): String = {
val code = inputs[0]
val inputsToCode = inputs.tail
interpretProgram(code, inputsToCode)
}

def interpretProgram(code: String, inputs: List[String]): String = {
...
}
}


We can run our partial evaluator to generate a version of the interpreter that's specialized for the code that the interpreter is supposed to execute:

1. We have an interpreter for language M, written in language L.
2. We have a partial evaluator for language L.
3. We run the partial evaluator on the interpreter, with a program
written in M that we want to interpret:
PartialEvaluator.specialize(M_Interpreter, M_Code).
4. The partial evaluator produces a program written in L.

So the partial evaluator took a program written in M, and transformed it into a program written in L!

We can go a step further - and generate an M to L compiler. How? By running the partial evaluator on itself. That is, run the partial
evaluator like this: PartialEvaluator.specialize(PartialEvaluator, M_Interpreter). The result is a program which takes an M program as
input, and generates an L program: that is, it's an M to L compiler!

We can go yet another step, and turn the partial evaluator into a
compiler generator: PartialEvaluator(PartialEvaluator, List(source(PartialEvalutor))). What we get is a program which takes an interpreter written in L, and generates a compiler from it.

It's possible to actually generate useful tools this way. People have actually implemented Lisp compilers this way! For example, you can see someone do a simple version here.

## Introducing Algebraic Data Structures via Category Theory: Monoids

Since joining foursquare, I've been spending almost all of my time writing functional programs. At foursquare, we do all of our server programming in Scala, and we have a very strong bias towards writing our scala code very functionally.

This has increased my interest in category theory in an interesting way. As a programming language geek, I'm obviously fascinated by data structures. Category theory provides a really interesting handle on a way of looking at a kind of generic data structures.

Historically (as much as that word can be used for anything in computer science), we've thought about data structures primarily in a algorithmic and structural ways.

For example, binary trees. A binary tree consists of a collection of linked nodes. We can define the structure recursively really easily: a binary tree is a node, which contains pointers to at most two other binary trees.

In the functional programming world, people have started to think about things in algebraic ways. So instead of just defining data structures in terms of structure, we also think about them in very algebraic ways. That is, we think about structures in terms of how they behave, instead of how they're built.

For example, there's a structure called a monoid. A monoid is a very simple idea: it's an algebraic structure with a set of values S, one binary operation *, and one value i in S which is an identity value for *. To be a monoid, these objects must satisfy some rules called the monad laws:

There are some really obvious examples of monoids - like the set of integers with addition and 0 or integers with multiplication and 1. But there are many, many others.

Lists with concatenation and the empty list are a monoid: for any list,
l ++ [] == l, [] + l == l, and concatenation is associative.

Why should we care if data structures like are monoids? Because we can write very general code in terms of the algebraic construction, and then use it over all of the different operations. Monoids provide the tools you need to build fold operations. Every kind of fold - that is, operations that collapse a sequence of other operations into a single value - can be defined in terms of monoids. So you can write a fold operation that works on lists, strings, numbers, optional values, maps, and god-only-knows what else. Any data structure which is a monoid is a data structure with a meaningful fold operation: monoids encapsulate the requirements of foldability.

And that's where category theory comes in. Category theory provides a generic method for talking about algebraic structures like monoids. After all, what category theory does is provide a way of describing structures in terms of how their operations can be composed: that's exactly what you want for talking about algebraic data structures.

The categorical construction of a monoid is, alas, pretty complicated. It's a simple idea - but defining it solely in terms of the composition behavior of function-like objects does take a bit of effort. But it's really worth it: when you see a monoidal category, it's obvious what the elements are in terms of programming. And when we get to even more complicated structures, like monads, pullbacks, etc., the advantage will be even clearer.

A monoidal category is a category with a functor, where the functor has the basic properties of a algebraic monoid. So it's a category C, paired with a bi-functor - that is a two-argument functor ⊗:C×C→C. This is the categorical form of the tensor operation from the algebraic monoid. To make it a monoidal category, we need to take the tensor operation, and define the properties that it needs to have. They're called its coherence conditions, and basically, they're the properties that are needed to make the diagrams that we're going to use commute.

So - the tensor functor is a bifunctor from C×C to C. There is also an object I∈C, which is called the unit object, which is basically the identity element of the monoid. As we would expect from the algebraic definition, the tensor functor has two basic properties: associativity, and identity.

Associativity is expressed categorically using a natural isomorphism, which we'll name α. For any three object X, Y, and Z, α includes a component αX,Y,Z (which I'll label α(X,Y,Z) in diagrams, because subscripts in diagrams are a pain!), which is a mapping from (X⊗Y)⊗Z to X⊗(Y⊗Z). The natural isomorphism says, in categorical terms, that the the two objects on either side of its mappings are equivalent.

The identity property is again expressed via natural isomorphism. The category must include an object I (called the unit), and two natural isomorphisms, called &lamba; and ρ. For any arrow X in C, &lamba; and ρ contain components λX and ρX such that λX maps from I⊗X→X, and ρX maps from X⊗I to X.

Now, all of the pieces that we need are on the table. All we need to do is explain how they all fit together - what kinds of properties these pieces need to have for this to - that is, for these definitions to give us a structure that looks like the algebraic notion of monoidal structures, but built in category theory. The properties are, more or less, exact correspondences with the associativity and identity requirements of the algebraic monoid. But with category theory, we can say it visually. The two diagrams below each describe one of the two properties.

The upper (pentagonal) diagram must commute for all A, B, C, and D. It describes the associativity property. Each arrow in the diagram is a component of the natural isomorphism over the category, and the diagram describes what it means for the natural isomorphism to define associativity.

Similarly, the bottom diagram defines identity. The arrows are all components of natural isomorphisms, and they describe the properties that the natural isomorphisms must have in order for them, together with the unit I to define identity.

Like I said, the definition is a lot more complicated. But look at the diagram: you can see folding in it, in the chains of arrows in the commutative diagram.

## The Basics of Software Transactional Memory

As promised, it's time for software transactional memory!

A bit of background, first. For most of the history of computers, the way that we've built software is very strongly based on the fact that a computer has a processor - a single CPU, which can do one thing at a time, in order. Until recently, that was true. But now it really isn't anymore. There's the internet - which effectively means that no computer is ever really operating in isolation - it's part of a huge computational space shared with billions of other computers. And even ignoring the internet, we're rapidly approaching the point where tiny devices, like cellphones, will have more than one CPU.

The single processor assumption makes things easy. We humans tend to think very sequentially - that is, the way that we describe how to do things is: do this, then do that. We're not so good at thinking about how to do lots of things at the same time. Just think about human language. If I want to bake a cake, what I'll do is: measure out the butter and sugar, put them in the mixer, mix it until they're creamed. Then add milk. Then in a separate bowl, measure out and sift the flour and the baking powder. Then slowly pour the dry stuff into the wet, and mix it together. Etc.

I don't need to fully specify that order. If I had multiple bakers, they could do many of steps at the same time. But how, in english, can I clearly say what needs to be done? I can say it, but it's awkward. It's harder to say, and harder to understand than the sequential instructions.

What I need to do are identifying families of things that can be done at the same time, and then the points at which those things will merge.

All of the measurements can be done at the same time. In any mixings step, the mixing can't be done until all of the ingredients are ready. Ingredients being ready could mean two things. It could mean that the ingredients were measured out; or it could mean that one of the ingredients for the mix is the product of one of the previous mixing steps, and that that previous step is complete. In terms of programming, we'd say that the measurement steps are independent; and the points at which we need to wait for several things to get done (like "we can't mix the dry ingredients in to the wet until the wet ingredients have all been mixed and the dry ingredients have been measured"), we'd call synchroniation points.

It gets really complicated, really quickly. In fact, it gets even more complicated than you might think. You see, if you were to write out the parallel instructions for this, you'd probably leave a bunch of places where you didn't quite fully specify things - because you'd be relying on stuff like common sense on the part of your bakers. For example, you'd probably say to turn on the over to preheat, but you wouldn't specifically say to wait until it reached the correct temperature to put stuff into it; you wouldn't mention things like "open the over door, then put the cake into it, then close it".

When we're dealing with multiple processors, we get exactly those kinds of problems. We need to figure out what can be done at the same time; and we need to figure out what the synchronization points are. And we also need to figure out how to do the synchronization. When we're talking about human bakers "don't mix until the other stuff is ready" is fine. But in software, we need to consider things like "How do we know when the previous steps are done?".

And it gets worse than that. When you have a set of processors doing things at the same time, you can get something called a race condition which can totally screw things up!

For example, imagine that we're counting that all of the ingredients are measured. We could imagine the mixer process as looking at a counter, waiting until all five ingredients have been measured. Each measuring process would do its measurement, and then increment the counter.

  val measureCount = 0
process Mixer() {
wait until measureCount == 5
}

process Measurer() {
do_measure
measureCount = measureCount + 1
}


What happens if two measurer finish at almost the same time? The last statement in Measurer actually consists of three steps: retrieve the value of measureCount; add one; store the incremented value. So we could wind up with:

Time Measurer1 Measurer2 measureCount
0 1
1 Fetch measureCount(=1) 1
2 Increment(=2) Fetch measurecount(=1) 1
3 Store updated(=2) Increment(=2) 2
4 Store updated(=2) 2

Now, Mixer will never get run! Because of the way that the two Measurers overlapped, one of the increments effectively got lost, and so the count will never reach 5. And the way it's written, there's absolutely no way to tell whether or not that happened. Most of the time, it will probably work - because the two processes have to hit the code that increments the counter at exactly the same time in order for there to be a problem. But every once in a while, for no obvious reason, the program will fail - the mixing will never get done. It's the worst kind of error to try to debug - one which is completely unpredictable. And if you try to run it in a debugger, because the debugger slows things down, you probably won't be able to reproduce it!

This kind of issue always comes down to coordination or synchronization of some kind - that is, the main issue is how do the different processes interact without stomping on each other?

The simplest approach is to use something called a lock. A lock is an object which signals ownership of some resource, and which has one really important property: in concept, it points at at most one process, and updating it is atomic meaning that when you want to look at it and update it, nothing can intervene between the read and write. So if you want to use the resource managed by the lock, you can look at the lock, and see if anyone is using it; and if not, set it to point to you. That process is called acquiring the lock.

In general, we wrap locks up in libraries to make them easier to use. If "l" was a lock, you'd take a lock by using a function something like "lock(l)", which really expanded to something like:

def take(L: Lock) {
while (L != me)
atomically do if L==no one then L=me
}


So the earlier code becomes:

val measureCount = 0
val l = new Lock()

process Mixer() {
wait until measureCount == 5
}

process Measurer() {
do_measure
lock(l)
measureCount = measureCount + 1
unlock(l)
}


In a simple example like that, locks are really easy. Unfortunately, real examples get messy. For instance, there's a situation called deadlock. A classic demonstration is something called the dining philosophers. You've got four philosophers sitting at a table dinner table. Between each pair, there's a chopstick. In order to eat, a philosopher needs two chopsticks. When they get two chopsticks, they'll use them to take a single bite of food, and then they'll put down the chopsticks. If Each philosopher starts by grabbing the chopstick to their right, then no one gets to each. Each has one chopstick, and there's no way for them to get a second one.

That's exactly what happens in a real system. You lock each resource that you want to share. For some operations, you're going to use more than one shared resource, and so you need two locks. If you have multiple tasks that need multiple resources, it's easy to wind up with a situation where each task has some subset of the locks that they need.

Things like deadlock mean that simple locks get hairy really quickly. Not that any of the more complex coordination strategies make deadlocks impossible; you can always find a way of creating a deadlock in any system - but it's a lot easier to create accidental deadlocks using simple locks than, say, actors.

So there's a ton of methods that try to make it easier to do coordination between multiple tasks. Under the covers, these ultimately rely on primitives like locks (or semaphores, another similar primitive coordination tool). But they provide a more structured way of using them. Just like structured control flow makes code cleaner, clearer, and more maintanable, structured coordination mechanism makes concurrency cleaner, clearer, and more maintainable.

Software transactional memory is one approach to this problem, which is currently very trendy. It's still not entirely clear to me whether or not STM is really quite up to the real world - current implementations remain promising, but inefficient. But before getting in to any of that, we need to talk about just what it is.

As I see it, STM is based on two fundamental concepts:

1. Optimism. In software terms, by optimism, we mean that we're going to plow ahead and assume that there aren't any errors; when we're done, we'll check if there was a problem, and clean up if necessary. A good example of this from my own background is source code control systems. In the older systems like RCS, you'd lock a source file before you edited it; then you'd make your changes, and check them in, and release the lock. That way, you know that you're never going to have two people making changes to the same file at the same time. But the downside is that you end up with lots of people sitting around doing nothing, waiting to get the lock on the file they want to change. Odds are, they weren't going to change the same part of the file as the guy who has the lock. But in order to make sure that they can't, the locks also block a lot of real work. Eventually, the optimistic systems came along, and what they did was say: "go ahead and edit the file all you want. But before I let you check in (save) the edited file to the shared repository, I'm going to make sure that no one changed it in a way that will mess things up. If I find out that someone did, then you're going to have to fix it."
2. Transactions. A transaction is a concept from (among other places) databases. In a database, you often make a collection of changes that are, conceptually, all part of the same update. By making them a transaction, they become one atomic block - and either the entire collection all succeedd, or the entire collection all fail. Transactions guarantee that you'll never end up in a situation where half of the changes in an update got written to the database, and the other half didn't.

What happens in STM is that you have some collection of special memory locations or variables. You're only allowed to edit those variables in a transaction block. Whenever a program enters a transaction block, it gets a copy of the transaction variables, and just plows ahead, and does whatever it wants with its copy of the transaction variables. When it gets to the end of the block, it tries to commit the transaction - that is, it tries to update the master variables with the values of its copies. But it checks them first, to make sure that the current value of the master copies haven't changed since the time that it made its copy. If they did, it goes back and starts over, re-running the transaction block. So if anyone else updated any of the transaction variables, the transaction would fail, and then get re-executed.

In terms of our baking example, both of the measurers would enter the transaction block at the same time; and then whichever finished first would commit its transaction, which would update the master count variable. Then when the second transaction finished, it would check the count variable, see that it changed, and go back and start over - fetching the new value of the master count variable, incrementing it, and then committing the result. In terms of code, you'd just do something like:

transactional val measureCount = 0

process Mixer() {
wait until measureCount == 5
}

process Measurer() {
do_measure
atomically {
measureCount = measureCount + 1
}
}


It's really that simple. You just mark all the shared resources as transactional, and then wrap the code that modifies them in a transaction block. And it just works. It's a very elegant solution.

Of course there's a lot more to it, but that's the basic idea. In the code, you identify the transactional variables, and only allow them to be updated inside of a transaction block. At runtime, when you encounter a transaction block, charge ahead, and do whatever you want. Then when you finish, make sure that there isn't an error. If there was, try again.

So what's it look like in a non-contrived programming language? These days, I'm doing most of my coding in Scala. There's a decent STM implementation for Scala as a part of a package called Akka.

In Akka, the way that you define a transactional variable is by using a Ref type. A Ref is a basically a cell that wraps a value. (It's almost like a pointer value in C.) So, for example, in our Baker example:

var count :Ref[Int] = Ref(0)


Then in code, to use it, you literally just wrap the code that modifies the Refs in "atomic". Alas, you don't quite get to treat the refs like normal variables - to access the value of a ref, you need to call Ref.get; to change the value, you need to use a method alter, which takes a function that computes the new value in terms of the old.

class Measurer {
def doMeasure() {
// do the measuring stuff
atomic {
ref.alter(_ + 1)
}
}
}


The "(_ + 1)" probably needs a quick explanation. In Scala, you can define a single expression function using "_" to mark the slot where the parameter should go. So "(_ + 1)" is equivalent to the lambda expression { x => x + 1}.

You can see, just from this tiny example, why STM is such an attractive approach. It's so simple! It's very easy to read and write. It's a very simple natural model. It's brilliant in its simplicity. Of course, there's more to it that what I've written about here - error handling, voluntary transaction abort, retry management, transaction waits - but this is the basics, and it really is this simple.

What are the problems with this approach?

1. Impurity. If not all variables are transactional, and you can modify a non-transactional variable inside of a transaction block, then you've got a mess onp your hands. Values from transactionals can "leak" out of transaction blocks.
2. Inefficiency. You've got to either copy all of the transactional variables at the entry to the transaction block, or you've got to use some kind of copy-on-write strategy. However you do it, you've got grief - aggressive copying, copy-on-write, memory protect - they've all got non-trivial costs. And re-doing the entire transaction block every time it fails can eat a lot of CPU.
3. Fairness. This is a fancy term for "what if one guy keeps getting screwed?" You've got lots of processes all working with the same shared resources, which are protected behind the transactional variables. It's possible for timing to work out so that one process keeps getting stuck doing the re-tries. This is something that comes up a lot in coordination strategies for concurrency, and the implementations can get pretty hairy trying to make sure that they don't dump all of the burden of retries on one process.

## The Wrong Way To Write Concurrent Programs: Actors in Cruise

I've been planning to write a few posts about some programming stuff that interests me. I've spent a good part of my career working on systems that need to support concurrent computation. I even did my PhD working on a system to allow a particular style of parallel programming. It's a really hard problem - concurrency creates a host of complex issues in how systems behave, and the way that you express concurrency in a programming language has a huge impact on how hard it is to read, write, debug, and reason about systems.

So, like I've said, I've spent a lot of time thinking about these issues, and looking at various different proposed solutions, as well as proposing a couple of my own. But I really don't know of any good writeup describing the basics of the most successful approaches for beginners. So I thought I could write one.

But that's not really today's post. Todays post is my version of a train-wreck. Long-time readers of the blog know that I'm fascinated with bizarre programming languages. So today, I'm going to show you a twisted, annoying, and thoroughly pointless language that I created. It's based on one of the most successful models of concurrent programming, called Actors, which was originally proposed by Professor Gul Agha of UIUC. There've been some really nice languages built using ideas from Actors, but this is not one of them.

The language is called "Cruise". Why? Because it's a really bad Actor language. And what name comes to mind when I think of really, really bad actors with delusions of adequacy? Tom Cruise.

You can grab my implementation from github. Just so you know, the code sucks. It's something I threw together in my spare time a couple of years ago, and haven't really looked at since. So it's sloppy, overcomplicated, probably buggy, and slow as a snail on tranquilizers.

### Quick overview of the actor model

Actors are a theoretical model of computation, which is designed to describe completely asynchronous parallel computation. Doing things totally asynchronously is very strange, and very counter-intuitive. But the fact of the matter is, in real distributed systems, everything *is* fundamentally asynchronous, so being able to describe distributed systems in terms of a simple, analyzable model is a good thing.

According to the actor model, a computation is described by a collection of things called, what else, actors. An actor has a mailbox, and a behavior. The mailbox is a uniquely named place where messages sent to an actor can be queued; the behavior is a definition of how the actor is going to process a message from its mailbox. The behavior gets to look at the message, and based on its contents, it can do three kinds of things:

1. Create other actors.
2. Send messages to other actors whose mailbox it knows.
3. Specify a new behavior for the actor to use to process its next message.

You can do pretty much anything you need to do in computations with that basic mechanism. The catch is, as I said, it's all asynchronous. So, for example, if you want to write an actor that adds two numbers, you can't do it by what you'd normally think of as a function call. In a lot of ways, it looks like a method call in something like Smalltalk: one actor (object) sends a message to another actor, and in response, the receiver takes some action specified by them message.

But subroutines and methods are synchronous, and nothing in actors is synchronous. In an object-oriented language, when you send a message, you stop and wait until the receiver of the message is done with it. In Actors, it doesn't work that way: you send a message, and it's sent; that's it, it's over and done with. You don't wait for anything; you're done. If you want a reply, you need to send the the other actor a reference to your mailbox, and make sure that your behavior knows what to do when the reply comes in.

It ends up looking something like the continuation passing form of a functional programming language: to do a subroutine-like operation, you need to pass an extra parameter to the subroutine invocation; that extra parameter is the *intended receiver* of the result.

You'll see some examples of this when we get to some code.

### Tuples - A Really Ugly Way of Handling Data

This subtitle is a bit over-the-top. I actually think that my tuple notion is pretty cool. It's loosely based on how you do data-types in Prolog. But the way that it's implemented in Cruise is absolutely awful.

Cruise has a strange data model. The idea behind it is to make it easy to build actor behaviors around the idea of pattern matching. The easiest/stupidest way of doing this is to make all data consist of tagged tuples. A tagged tuple consists of a tag name (an identifier starting with an uppercase letter), and a list of values enclosed in the tuple. The values inside of a tuple can be either other tuples, or actor names (identifiers starting with lower-case letters).

So, for example, Foo(Bar(), adder) is a tuple. The tag is "Foo". It's contents are another tuple, "Bar()", and an actor name, "adder".

Since tuples and actors are the only things that exist, we need to construct all other types of values from some combination of tuples and actors. To do math, we can use tuples to build up Peano numbers. The tuple "Z()" is zero; "I(n)" is the number n+1. So, for example, 3 is "I(I(I(Z())))".

The only way to decompose tuples is through pattern matching in messages. In an actor behavior. message handlers specify a *tuple pattern*, which is a tuple where some positions may be filled by{em unbound} variables. When a tuple is matched against a pattern, the variables in the pattern are bound to the values of the corresponding elements of the tuple.

A few examples:

• matching I(I(I(Z()))) with I($x) will succeed with $x bound to I(I(Z)).
• matching Cons(X(),Cons(Y(),Cons(Z,Nil()))) with Cons($x,$y) will succeed with $x bound to X(), and $y bound to Cons(Y(),Cons(Z(),Nil())).
• matching Cons(X(),Cons(Y(),Cons(Z(),Nil()))) with Cons($x, Cons(Y(), Cons($y, Nil()))) will succeed with $x bound to X(), and $y bound to Z().
• ### Code Examples!

Instead of my rambling on even more, let's take a look at some Cruise programs. We'll start off with Hello World, sort of.

actor !Hello {
behavior :Main() {
on Go() { send Hello(World()) to out }
}
initial :Main
}

instantiate !Hello() as hello
send Go() to hello



This declares an actor type "!Hello"; it's got one behavior with no parameters. It only knows how to handle one message, "Go()". When it receives go, it sends a hello world tuple to the actor named "out", which is a built-in that just prints whatever is sent to it.

Let's be a bit more interesting, and try something using integers. Here's some code to do a greater than comparison:

actor !GreaterThan {
behavior :Compare() {
on GT(Z(),Z(), $action,$iftrue, $iffalse) { send$action to $iffalse } on GT(Z(), I($x), $action,$iftrue, $iffalse) { send$action to $iffalse } on GT(I($x), Z(), $action,$iftrue, $iffalse) { send$action to $iftrue } on GT(I($x), I($y),$action, $iftrue,$iffalse) {
send GT($x,$y,$action,$iftrue,$iffalse) to$self
}
}
initial :Compare
}

actor !True {
behavior :True() {
on Result() { send True() to out}
}
initial :True
}

actor !False {
behavior :False() {
on Result() { send False() to out}
}
initial :False
}

instantiate !True() as true
instantiate !False() as false
instantiate !GreaterThan() as greater
send GT(I(I(Z())), I(Z()), Result(), true, false) to greater
send GT(I(I(Z())), I(I(I(Z()))), Result(), true, false) to greater
send GT(I(I(Z())), I(I(Z())), Result(), true, false) to greater



This is typical of how you do "control flow" in Cruise: you set up different actors for each branch, and pass those actors names to the test; one of them will receive a message to continue the execution.

What about multiple behaviors? Here's a trivial example of a flip-flop:

actor !FlipFlop {
behavior :Flip() {
on Ping($x) { send Flip($x) to out
}
on Pong($x) { send Flip($x) to out
}
}
behavior :Flop() {
on Ping($x) { send Flop($x) to out
}
on Pong($x) { send Flop($x) to out
}
}
initial :Flip
}

instantiate !FlipFlop() as ff
send Ping(I(I(Z()))) to ff
send Ping(I(I(Z()))) to ff
send Ping(I(I(Z()))) to ff
send Ping(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff



If the actor is in the ":Flip" behavior, then when it gets a "Ping", it sends "Flip" to out, and switches behavior to flop. If it gets point, it just sents "Flip" to out, and stays in ":Flip".

The ":Flop" behavior is pretty much the same idea, accept that it switches behaviors on "Pong".

An example of how behavior changing can actually be useful is implementing settable variables:

actor !Var {
behavior :Undefined() {
on Set($v) { adopt :Val($v) }
on Get($target) { send Undefined() to$target }
on Unset() { }
}
behavior :Val($val) { on Set($v) { adopt :Val($v) } on Get($target) { send $val to$target }
on Unset() { adopt :Undefined() }
}
initial :Undefined
}
instantiate !Var() as v
send Get(out) to v
send Set(I(I(I(Z())))) to v
send Get(out) to v



Two more programs, and I'll stop torturing you. First, a simple adder:

actor !Adder {
on Plus(Z(),$x,$target) {
send $x to$target
}
on Plus(I($x),$y, $target) { send Plus($x,I($y),$target) to $self } } initial :Add } actor !Done { behavior :Finished() { on Result($x) { send Result($x) to out } } initial :Finished } instantiate !Adder() as adder instantiate !Done() as done send Plus(I(I(I(Z()))),I(I(Z())), out) to adder  Pretty straightforward - the only interesting thing about it is the way that it sends the result of invoking add to a continuation actor. Now, let's use an addition actor to implement a multiplier actor. This shows off some interesting techniques, like carrying auxiliary values that will be needed by the continuation. It also shows you that I cheated, and added integers to the parser; they're translated into the peano-tuples by the parser. actor !Adder { behavior :Add() { on Plus(Z(),$x, $misc,$target) {
send Sum($x,$misc) to $target } on Plus(I($x), $y,$misc, $target) { send Plus($x,I($y),$misc, $target) to$self
}
}
}

actor !Multiplier {
behavior :Mult() {
on Mult(I($x),$y, $sum,$misc, $target) { send Plus($y, $sum, MultMisc($x, $y,$misc, $target),$self) to adder
}
on Sum($sum, MultMisc(Z(),$y, $misc,$target)) {
send Product($sum,$misc) to $target } on Sum($sum, MultMisc($x,$y, $misc,$target)) {
send Mult($x,$y, $sum,$misc, $target) to$self
}
}
initial :Mult
}

instantiate !Multiplier() as multiplier
send Mult(32, 191, 0, Nil(), out) to multiplier



So, is this Turing complete? You bet: it's got peano numbers, conditionals, and recursion. If you can do those three, you can do anything.

## A Taste of Specification with Alloy

In my last post (which was, alas, a stupidly long time ago!), I talked a bit about software specification, and promised to talk about my favorite dedicated specification tool, Alloy. Alloy is a very cool system, designed at MIT by Daniel Jackson and his students.

Alloy is a language for specification, along with an environment which allows you to test your specifications. In a lot of ways, it looks like a programming language - but it's not. You can't write programs in Alloy. What you can do is write concise, clear, and specific descriptions of how something else works.

I'm not going to try to really teach you Alloy. All that I'm going to do is give you a quick walk-though, to try to show you why it's worth the trouble of learning. If you want to learn it, the Alloy group's website has a really good the official Alloy tutorial. which you should walk through.

## The Value of Tests: It's more than just testing!

Since I have some free time, I've been catching up on some of the stuff
I've been meaning to read. I've got a reading list of stuff that I've wanted
to look at that were written by other authors with my publisher. Yesterday, I started looking at Cucumber, which is an interesting behavior-driven development tool. This post isn't really about Cucumber, but about something that Cucumber reminded me of.

When a competent programmer builds software, they write tests. That's just
a given. But why do we do it? It seems like the answer is obvious: to make sure that our software works. But I'd argue that there's another reason, which in the long run is as important as the functional one. It's to describe what the software does. A well-written test doesn't just make sure that the software does the right thing - it tells other programmers what the code is supposed to do.

A test is an executable specification. Specifications are a really good thing; executable specifications are even better.

## Stuff Everyone Should Do (part 2): Coding Standards

Another thing that we did at Google that I thought was surprisingly effective and useful was strict coding standards.

Before my time at Google, I was sure that coding standards were pointless. I had absolutely no doubt that they were the kind of thing that petty bureaucrats waste time writing and then use to hassle people who are actually productive.

I was seriously wrong.

At Google, I could look at any piece of code, anywhere in Google's codebase, and I could read it. The fact that I was allowed to do that was pretty unusual in itself. But what was surprising to me was just how much the standardization of style - indents, names, file structures, and comment conventions - made it dramatically easier to look at a piece of unfamiliar code and understand it. This is still surprising to me - because those are all trivial things. They shouldn't have much impact - but they do. It's absolutely shocking to realize how much of the time you spend reading code is just looking for the basic syntactic structure!

There's a suite of common objections to this, all of which I used to believe.

It wastes time!
I'm a good coder, and I don't want to waste time on stupidity. I'm good enough that when I write code, it's clear and easy to understand. Why should I waste my time on some stupid standard? The answer is: because there is a value in uniformity. As I alluded to earlier - the fact that every piece of code that you look at -- whether it was written by you, by one of your closest coworkers, or by someone 11 timezones away -- will always demarcate structures in the same way, will always use the same naming conventions - it really, genuinely makes a big difference. You need so much less effort to read code that you haven't looked at in a while (or at all), because you can immediately recognize the structure.
I'm an artist!
This is phrased facetiously, but it does reflect a common complaint. We programmers have a lot of pride in our personal style. The code that I write really does reflect something about me and how my mind works. It's a reflection of my skill and my creativity. If I'm forced into some stupid standard, it seems like it's stifling my creativity. The thing is, the important parts of your style, the important reflections of your mind and your creativity aren't in trivial syntactic things. (If it is, then you're a pretty crappy programmer.) The standard actually makes it easier for other people to see your creativity - because they can actually see what you're doing, without being distracted by the unfamiliar syntactic quirks.
One size fits all actually fits none!
If you have a coding standard that wasn't designed specifically for your project, then it's probably non-optimal for your project. That's fine. Again, it's just syntax: non-optimal doesn't mean bad. The fact that it's not ideal for your project doesn't mean that it's not worth doing. Yeah, sure, it does reduce the magnitude of the benefit for your project, but at the same time, it increases the magnitude of the benefit for the larger organization. In addition, it frequently makes sense to have project-specific code styles. There's nothing wrong with having a project-specific coding standard. In fact, in my experience, the best thing is to have a very general coding standard for the larger organization, and then project-specific extensions of that for the project-specific idioms and structures.
I'm too good for that!
This is actually the most common objection. It's sort-of a combination of the others, but it gets at an underlying attitude in a direct way. This is the belief on the part of the complainer that they're a better programmer than whoever wrote the standard, and lowering themselves to following the standard written by the inferior author will reduce the quality of the code. This is, to put it mildly, bullshit. It's pure arrogance, and it's ridiculous. The fact of the matter is that no one is so good that any change to their coding style will damage the code. If you can't write good code to any reasonable coding standard, you're a crappy programmer.

When you're coding against a standard, there are inevitably going to be places where you disagree with the standard. There will be places where your personal style is better than the standard. But that doesn't matter. There will, probably, also be places where the standard is better than your style. But that doesn't matter easier. As long as the standard isn't totally ridiculous, the comprehension benefits are significant enough to more than compensate for that.

But what if the coding standard is totally ridiculous?

Well, then, it's rough to be you: you're screwed. But that's not really because of the ridiculous coding standard. It's because you're working for idiots. Screwing up a coding standard enough to really prevent good programmers from writing good code is hard. It requires a sort of dedicated, hard-headed stupidity. If you're working for people who are such idiots that they'd impose a broken coding standard, then they're going to do plenty of other stupid stuff, too. If you're working for idiots, you're pretty much screwed no matter what you do, coding standard or no. (And I don't mean to suggest that software businesses run by idiots are rare; it's an unfortunate fact, but there's no shortage of idiots, and there are plenty of them that have their own businesses.)

## Things Everyone Should Do: Code Review

As I alluded to in my last post (which I will be correcting shortly), I no longer work for Google. I still haven't decided quite where I'm going to wind up - I've got a couple of excellent offers to choose between. But in the interim, since I'm not technically employed by anyone, I thought I'd do a bit of writing about some professional things that are interesting, but that might have caused tension with coworkers or management.

Google is a really cool company. And they've done some really amazing things - both outside the company, where users can see it, and inside the company. There are a couple of things about the inside that aren't confidential, but which also haven't been discussed all that widely on the outside. That's what I want to talk about.

The biggest thing that makes Google's code so good is simple: code review. That's not specific to Google - it's widely recognized as a good idea, and a lot of people do it. But I've never seen another large company where it was such a universal. At Google, no code, for any product, for any project, gets checked in until it gets a positive review.

Everyone should do this. And I don't just mean informally: this should really be a universal rule of serious software development. Not just product code - everything. It's not that much work, and it makes a huge difference.

What do you get out of code review?

There's the obvious: having a second set of eyes look over code before it gets checked in catches bugs. This is the most widely cited, widely recognized benefit of code review. But in my experience, it's the least valuable one. People do find bugs in code review. But the overwhelming majority of bugs that are caught in code review are, frankly, trivial bugs which would have taken the author a couple of minutes to find. The bugs that actually take time to find don't get caught in review.

The biggest advantage of code review is purely social. If you're programming and you know that your coworkers are going to look at your code, you program differently. You'll write code that's neater, better documented, and better organized -- because you'll know that people who's opinions you care about will be looking at your code. Without review, you know that people will look at code eventually. But because it's not immediate, it doesn't have the same sense of urgency, and it doesn't have the same feeling of personal judgement.

There's one more big benefit. Code reviews spread knowledge. In a lot of development groups, each person has a core component that they're responsible for, and each person is very focused on their own component. As long as their coworkers components don't break their code, they don't look at it. The effect of this is that for each component, only one person has any familiarity with the code. If that person takes time off or - god forbid - leaves the company, no one knows anything about it. With code review, you have at least two people who are familiar with code - the author, and the reviewer. The reviewer doesn't know as much about the code as the author - but they're familiar with the design and the structure of it, which is incredibly valuable.

Of course, nothing is every completely simple. From my experience, it takes some time before you get good at reviewing code. There are some pitfalls that I've seen that cause a lot of trouble - and since they come up particularly frequently among inexperienced reviewers, they give people trying code reviews a bad experience, and so become a major barrier to adopting code review as a practice.

The biggest rule is that the point of code review is to find problems in code before it gets committed - what you're looking for is correctness. The most common mistake in code review - the mistake that everyone makes when they're new to it - is judging code by whether it's what the reviewer would have written.

Given a problem, there are usually a dozen different ways to solve it. Andgiven a solution, there's a million ways to render it as code. As a reviewer, your job isn't to make sure that the code is what you would have written - because it won't be. Your job as a reviewer of a piece of code is to make sure that the code as written by its author is correct. When this rule gets broken, you end up with hard feelings and frustration all around - which isn't a good thing.

The thing is, this is such a thoroughly natural mistake to make. If you're a programmer, when you look at a problem, you can see a solution - and you think of what you've seen as the solution. But it isn't - and to be a good reviewer, you need to get that.

The second major pitfall of review is that people feel obligated to say something. You know that the author spent a lot of time and effort working on the code - shouldn't you say something?

No, you shouldn't.

There is never anything wrong with just saying "Yup, looks good". If you constantly go hunting to try to find something to criticize, then all that you accomplish is to wreck your own credibility. When you repeatedly make things to criticize just to find something to say, then the people who's code you review will learn that when you say something, that you're just saying it to fill the silence. Your comments won't be taken seriously.

Third is speed. You shouldn't rush through a code review - but also, you need to do it promptly. Your coworkers are waiting for you. If you and your coworkers aren't willing to take the time to get reviews done, and done quickly, then people are going to get frustrated, and code review is just going to cause frustration. It may seem like it's an interruption to drop things to do a review. It shouldn't be. You don't need to drop everything the moment someone asks you to do a review. But within a couple of hours, you will take a break from what you're doing - to get a drink, to go to the bathroom, to talk a walk. When you get back from that, you can do the review and get it done. If you do, then no one will every be left hanging for a long time waiting on you.

## The Perils of Premature Optimization

A reader who's known me for a while just sent me a note asking me to say something about premature optimization. The reason that I mention that he's known me for a while (in fact, someone who used to be a coworker) is because this is a topic that I'm prone to rant about in person, but I've never mentioned on the blog. So he knows that it's something that I've got a strong opinion about. Basically, he's having trouble dealing with an annoying coworker doing this, and he wants a rant-by-proxy. I'm OK with that. :-).

When you're writing a new piece of software, particularly on a modern computer, one of the unintuitive things that frequently happens is that the performance of your system is really quite different from what you'd expect.And even when everything is as expected, most people don't have a particularly good sense of tradeoffs - if I make this thing faster, what effect will it have on that? and if it does really improve the performance, how much will it improve it, and at what cost? If you can't answer that - and answer it precisely, with supporting evidence - then you have no business optimizing.

So when you sit down to write some new code, what you should really do is write code that's algorithmically efficient - that is, you pick an algorithm that has good performance in asymptotic time - and implement it in a straightforward way.

But what many people do - in fact, what pretty much all of us do at least some of the time - is try to be clever. We try to find opportunities to change the code to make it faster. Doing that before you understand where the computer actually spends its time when it's running the program is what we call premature optimization.

Why not?