A Taste of Specification with Alloy

Sep 24 2011 Published by under Program Specification, Programming

In my last post (which was, alas, a stupidly long time ago!), I talked a bit about software specification, and promised to talk about my favorite dedicated specification tool, Alloy. Alloy is a very cool system, designed at MIT by Daniel Jackson and his students.

Alloy is a language for specification, along with an environment which allows you to test your specifications. In a lot of ways, it looks like a programming language - but it's not. You can't write programs in Alloy. What you can do is write concise, clear, and specific descriptions of how something else works.

I'm not going to try to really teach you Alloy. All that I'm going to do is give you a quick walk-though, to try to show you why it's worth the trouble of learning. If you want to learn it, the Alloy group's website has a really good the official Alloy tutorial. which you should walk through.

We'll walk through Alloy using a canonical example, most of which is taken from the tutorial - a filesystem. Program specification tools are constantly described using a filesystem, because it makes a great example. The basic concepts of filesystems are familiar to just about everyone.They're complex enough to be interesting, but simple enough to be tractable. On a personal note, I really like this example, because the filesystem model is also closely related to the example that motivated me to first try Alloy out.

Alloy is based, ultimately, on set theory. It provides you with a very constrained way of talking about things in terms of sets. It can be a bit tricky at times: many typical Alloy statements sort-of look like they're statements in a normal programming language that are performing operations on objects. But they're not. They're expressions describing sets and relationships between sets. It's important to always keep that in mind when you're reading Alloy specifications.

The most basic element of a spec in Alloy is a signature. A signature defines a set of atoms and a collection of properties that exist for those objects. A signature is almost like a class in an object-oriented programming language - but that's really because the object-oriented notion is based on set theory, rather than Alloy specifications being object-oriented per se.

Here's a basic definition of file system objects in Alloy:

sig FSObject { parent : lone Dir }
sig Dir extends FSObject {
	children: set FSObject
}

sig FileContents { }

sig File extends FSObject {
	 contents: one FileContents
}

This says that there's a set of objects that we call filesystem objects. Every filesystem object has a property, "parent", and a filesystem object's parent can be either 0 or 1 other filesystem objects.

To harp on a point: the definition of parent looks like it defines a field of a class. It doesn't. It defines a relation between atoms. Taken together with the signature definition that encloses it, it asserts that there is a set of objects, which we're calling FSOBject; and for each filesystem object in that set, there's a predicate that can associate it with another FSObject. In alloy, you are not defining objects. You can't "create" a filesystem object with a particular parent. You can just assert that filesystem objects exist, and that they are related to other objects that we call their parents.

After the base FSObject set is defined, we can define subsets of it. We define two: the Dir objects also have a children property; and the File object, which has contents. We don't care what the contents of a file are - we're not going to talk about the properties of the contents of the file. We say that by giving them an empty declaration.

Once we've defined objects, we can describe the way they behave by using facts. A fact is an assertion - something that we assert must always be true. For example:

  fact { all d: Dir, o: d.children | o.parent = d }

This says that for any directory d, if a file o is in the children of d, then the parent of o must be d. So we've just stated a really important property of our filesystem. It's not like a Unix filesystem: in Unix, a file can be a member of many directories; in our model filesystem, we've said that it can't. That's the kind of thing that really matters, and that a specification helps you clarify. If you don't think about that, you can make some really dreadful errors in your filesystem by sometimes assuming that a file can't be in more than one directory! Even at this ridiculously simple level, there's already a potential benefit to this specification stuff. (This used to contain an error; I copied an earlier version of the assertion, where I had used the name "contents" for both the relation between a directory and its children, and the relation between a file and its contents object. Thanks to reader Paul Batum for catching it!)

Now, we can move on to more facts:

one sig Root extends Dir { } { no parent }

fact { FSObject in Root.*contents }

The first of these uses an alternative way of handling facts. You can state a fact about a sig by putting a second brace-block after the signature - each line in that is a fact that's true for all members of the set. The fact "no parent" is Alloy's notation for saying that a relation is empty - it's equivalent to something like \(forall x : FSObject | x notin Root.parent\), or cardinality(Root.parent) = 0. So what this says is that there's exacty one special directory, called the Root, which has no parent. Taken together, this facts and the "one" annotation mean that there is only one root - we're saying that the subset of Dir that has no parent contains exactly one element.

The second fact, in the more typical fact form, says that the filesystem is completely connected. Root.*contents is equivalent to Root.contents unioned with Root.contents.contents, unioned with Root.contents.contents.contents. In other words, it's the transitive closure of the elements of the contents relation.

So now we have a basic Alloy specification that describes one of the salient characteristics of a filesystem. What have we gained by doing this? One of the things that makes Alloy particularly amazing as a specification tool is that it isn't just static. When you give it a specification, you can also make statements that you believe should be true. Alloy can then check to see if they are in fact true; and if they aren't, it can generate a specific counterexample. So if you're wrong, Alloy doesn't just tell you you're wrong: it shows you a specific example demonstrating why.

Let's try an example of that. Suppose we accidentally forgot use the "one" annotation on root. But one of the key properties of a filesystem is that it only has one root. So we could write an assertion:

// File system has one root
assert oneRoot { one d: Dir | no d.parent }
check oneRoot for 5

This says that we believe that there should be only one directory which has no parent - that is, that there is one root. Then we tell it to check that for a scope of 5. That means that we want it to try to build instances where there are 5 distinct instances of each signature. In general, you can't compute examples for large scopes; but again, speaking in general, that's not a problem. Most of the time, when you've got an error in the spec, it doesn't take that many objects to reveal it.

Let's give it a shot here. When I plug this into Alloy, I get several counterexamples. The most obvious is this one:

It's got three directories, two of which are roots. That's a nice minimal counter-example. It's not surprising; that's exactly what I had in mind when I initially removed the "one" from the declaration of "Root". But it also produces another counterexample, which surprised me.

My specification, as written, also permits a case where there are no root directories; in fact, there are no files, but there are some "FileContents" not associated with files. For this latter example, in the specification, I forgot to say that a file-contents must be associated with a file. I can correct this by adding a fact:

fact { all fc : FileContents | some f : File | f.contents = fc }

This says that all filecontents are associated with files. The way that I wrote it, it allows multiple files to share the same contents. To write that, I needed to decide just what I wanted them to mean. The contents could be something like the chunk of data on disk that's stored in the file. If that were the case, then for a simple filesystem, there should be a one-to-one relationship between files and contents. On the other hand, if the contents of a file were the data stored in the file, then there's no reason that two different files wouldn't have the same contents.

This kind of decision process is very typical of the process of specification. Specification is about finding the places where your understanding of the system that you're building is incomplete - exactly the way that Alloy just helped us here.

With that fixed, and the "one" added back to the root declaration, Alloy finds no counterexamples. My specification is correct, at least within the scope size that I tested.

The specification so far is completely static. It doesn't allow for any concept of time or change. But a real filesystem does change. You can create files, delete files, move files, etc. To do that in allow, you use predicates. Predicates are, roughly, the Alloy equivalent of functions. A predicate lets you abstract a collection of facts over a set of parameters. For example, we could write a predicate that tests if an FSObject is a child of a particular directory:

pred IsChild[f : FSObject, d : Dir] {
   f in d.*children
}

The way that we capture time in Alloy is to talk about the states of an object before and after an operation. Before I get to that, I'm going to introduce a variation of the filesystem specification. The original spec just said that there was one root directory in the universe. But now, that's not going to work - because we need to be able to talk about the root directory both before and after some operation, and we're going to talk about them as two different objects. So we need to have a notion of a filesystem as an object itself - then, we can have multiple filesystems, each of which has exactly one root directory.

// File system objects
abstract sig FSObject { }
sig File, Dir extends FSObject { }

// A File System
sig FileSystem {
  live: set FSObject,
  root: Dir & live,
  parent: (live - root) ->one (Dir &live),
  contents: Dir -> FSObject
}{
  // live objects are reachable from the root
  live in root.*contents
  // parent is the inverse of contents
  parent = ~contents
}

So, we now have a filesystem object. The directory-containment relationship is now not part of the individual directories, but it's part of the filesystem object itself. So when we change the filesystem, we only need to change the state of that one object.

The way that we do that is by saying that the "parent" relation is a mapping from each member of the set of live filesystem objects excluding the root to exactly one live directory in the filesystem. Then in the facts for the signature, we also say the parent relation and the contents relation are inverses - so when we later describe a change in either one, the other is automatically updated to match. This illustrates one of the nice things about specification: we never have to say how to do something. A specification isn't about how; it's about what. So we need to say what the effect of an operation is, but unless we want to, we don't need to say how it's performed.

Now, within this, we can write a "move" operation, to move a file from one directory to another:

pred move [fs, fs': FileSystem, x: FSObject, d: Dir] {
  (x + d) in fs.live
  fs'.parent = fs.parent - x->(x.(fs.parent)) + x->d
}

"fs" is the state of the filesystem before the operation; "fs'" is the state of the filesystem after. So this predicate says that "I successfully moved x to d in fs resulting in fs' if and only if the following facts are true:"

  1. Both the file and the target directory are part of the filesystem.
  2. The parent relation in "fs'" has the old relationship for x removed (so that x was removed from its old directory), and the new parentage of "x" added (so that it's now a child of the target directory).

Again, we're exploiting the fact that we don't need to say how move works - we're just saying what it does.

There's a whole lot more to Alloy. I don't want to go into more detail here, because I don't think I can do it justice. Alloy is an amazing tool, but it's not a simple one. The last time that I did serious work in Alloy was in Alloy version 3 (I think; it might even have been v2). In the transition from to the current version, the language has changed dramatically for the better - but I spent nearly enough time learning the new version. If enough readers are interested, I'll spend some time re-building my old SCM specification in Alloy4, and then write a series of posts about it.

11 responses so far

  • Paul Batum says:

    fact { all d: Dir, o: d.contents | o.parent = d }

    Should this be:
    fact { all d: Dir, o: d.children | o.parent = d }
    ?

    I'm looking at your set definition for Dir and it doesn't look like it has a thing called "contents".

    • MarkCC says:

      Yes, you're right. I pasted the wrong version of the spec into the post. In an early draft, I used "contents" as the name of both the relationship between a directory and its children, and a file and its content object.

      Thanks for the catch!

  • Joe English says:

    > If enough readers are interested, I'll spend some time re-building my old
    > SCM specification in Alloy4, and then write a series of posts about it.

    That would be *hugely* interesting. Yes, please!

    • I'll add to this - one reason I've never used a formal specification language in anger is that all the examples I've see are for toy projects (like the Alloy tutorial).

      Seeing a formal specification of a useful system would interest me greatly.

      • Harald Korneliussen says:

        You can find them in the literature surrounding Alloy. I remember there was one paper that modelled a cryptographic protocol, and managed to find an error in it.

        The first thing Daniel Jackson did in his Alloy book, was also to model a very simple system of groups and aliases, such as one might find in an email app - and then proceeded to find a bug that was actually present in Mac OS' email program at the time.

        Pity the Alloy book is outdated, they've changed the syntax. Also a pity that I misplaced that book somewhere 🙂

  • Michael Robin says:

    Have you compared to any other ontological frameworks / spec. languages?
    The software seems a little stale - 2008 for the last update - are there other tools that use this format? Do you have a toolchain that makes use of Alloy to create some artifact?
    Is there value in the Software Abstractions book? - there seems to be mixed optinions on this in the reviews.
    thanks

  • Ian Oliver says:

    I was very pleasantly surprised to see a posting about Alloy. I had the pleasure of working with Daniel and his team at MIT for a short while when there was some cooperation with Nokia.

    We used Alloy (along with UML) to develop the basis of a system supporting certain aspects of "Semantic Web" with some very good results regarding the overall quality of the system.

    I wrote an experience paper a few years back for the BCS Christmas Workshop about this. If anyone is interested please contact me.

  • James Matta says:

    I recently finished writing a big multi-threaded data analysis program for some of my physics research and I would like to check the Bounding Universal B-Link tree I use for correctness.

    Looking at this article it seems to me that alloy might be the right tool for the job, am I correct?

    Whether or not it is I would love to see more posts on Alloy, it has captured my interest.

  • Harald Korneliussen says:

    I just rediscovered your blog, MarkCC, and I must say I'm very pleased to see something about Alloy in here. My brother gave me a book about it some years ago, and I used it for some simple stuff, but then I lost that book (and they changed the Alloy syntax so the book is out of date anyway- annoying!).

    Always seemed to me it had potential way beyond the limited attention it has received.

    A series on it, a la your old series on Haskell, would be utterly great. I'm adding your RSS feed just in case you decide to do it.

  • This is a good forum to point out Pamela Zave's interesting and approachable work on lightweight modeling of network protocols. She uses Alloy for prototyping and debugging protocols like Chord.

    http://www2.research.att.com/~pamela/model.html

Leave a Reply