## Turing Machines - what they are, what they aren't.

It's the anniversary of the birth of Alan Turing. So there's a ton of people writing commemorative stories. And naturally, a ton of those stories get it wrong. And they get it wrong in a very sad way.

Of course, when you talk about Turing, you talk about Turing machines. Despite the fact that Turing did lots of stuff besides just that machine, it's always the machine that people focus on. Partly that's laziness - the machine has his name after all, so it's one of the first things you find if you Google Turing. It's also the easiest thing to write about.

What do they say about the Turing machine? It' "the simplest computing device". It's the "basis for modern computers". It's "the theoretical model for the microchips in your laptop". It's the "mathematical description of your computer". None of those things are true. And they all both over and under-state what Turing really did. In terms of modern computers, the Turing machine's contribution to the design of real computers is negligible if not non-existent. But his contrubition to our understanding of what computers can do, and our understanding of how mathematics really works - they're far, far more significant than the architecture of any single machine.

The Turing machine isn't a model of real computers. The computer that you're using to read this has absolutely nothing to do with the Turing machine. As a real device, the turing machine is absolutely terrible.

The turing machine is a mathematical model not of computers, but of computation. That's a really important distinction. The Turing machine is an easy to understand model of a computing device. It's definitely not the simplest model. There are simpler computing devices (for example, I think that the rule 111 machine is simpler) - but their simplicitly makes them harder to understand.

The reason that the Turing machine is so important comes down to two important facts. First, which machine you use to talk about computation doesn't matter. There's a limit to what a mechanical device can do. There are lots of machines out there - but ultimately, no machine can go past the limit. Any machine that can reach that limit is, for the purposes of the theory of computation, pretty much the same. When we talk about studying computation, what we're talking about is the set of things that can be done by a machine - not by a specific machine, but by any machine. The specific choice of machine isn't important. And that's the point: computation is computation. That's what Turing figured out.

The Turing machine is a brilliant creation. It's a simple machine. It's really easy to understand. And it's easy to tweak - that is, it's easy to do experiments where you can modify the machine, and see what effect it has.

So let's take a step back, and see: what is a Turing machine?

The Turing machine is a very simple kind of theoretical computing device. In fact, it's almost downright trivial. But according to everything that we know and understand about computation, this trivial device is capable of any computation that can be performed by any other computing device.

The basic idea of the Turing machine is very simple. It's a machine that runs on top of a tape, which is made up of a long series of little cells, each of which has a single character written on it. The machine is a read/write head that moves over the tape, and which can store a little bit of information. Each step, the machine looks at the symbol on the cell under the tape head, and based on what it sees there, and whatever little bit of information it has stored, it decides what to do. The things that it can do are change the information it has store, write a new symbol onto the current tape cell, and move one cell left or right.

That's really it. People who like to make computing sound impressive often have very complicated explanations of it - but really, that's all there is to it. The point of it was to be simple - and simple it certainly is. And yet, it can do anything that's computable.

To really understand how that trivial machine can do computations, it helps to be a bit formal. In formal terms, we talk about a turing machine as a tuple: (S, s0, A, T), where:

• S is a finite list of possible states that the machine can be in. The state is the information that the machine can store in the head to make decisions. It's a very limited amount of information; you can think of it as either a specific list of states, or a small set of small numbers. For most Turing machines, we'll use it as a specific list of states. (You'll see what I mean in a minute.)
• s0S is the initial state - the state that the machine will be in when it starts a computation.
• A is the machine's alphabet, which is the set of symbols which can appear on the machine's tape.
• T is the machines transition function. This is the real heart of the machine, where the computation is defined. It's a function from the machine state and the alphabet character on the current tape cell to the action that the machine should take. The action is a triple consisting of a new value for the machine's state, a character to write onto the current tape cell, and a direction to move the tape head - either left or right.

So, for example, let's look at a simple machine. This is one of the classic examples: a Turing machine that does subtraction using unary numbers. A unary number "N" is written as a series of N 1s. For the program, we'll give the machine an input in the format "N-M=" written onto the tape; after running the machine, the tape will contain the value of M subtracted from N. So, for example, we could use "1111-11=" as an input; the output would be "11".

The alphabet is the characters "1", " " (blank space), "-" (minus sign), and "=" (equal sign). The machine has four states: {"scanright", "eraseone", "subone", "skip"}. It starts in the state "scanright". It's transition function is given by the following table:

FromState Symbol ToState WriteChar Dir
scanright space scanright space right
scanright 1 scanright 1 right
scanright minus scanright minus right
scanright equal eraseone space left
eraseone 1 subone equal left
eraseone minus HALT space n/a
subone 1 subone 1 left
subone minus skip minus left
skip space skip space left
skip 1 scanright space right

What this machine does is move to the right until it sees the equal sign; it erases the equal sign and moves to the left, erases one digit off of the second number, and replaces it with the equal sign (so the second number is reduced by one, and the equal sign is moved over one position). Then it scans back to the minus sign (which separates the two numbers), and erases one digit of the first number, and then switches back to scanning to the right for the equal. So one at a time, it erases one digit from each of the two numbers. When it reaches the equal sign, and starts going back to erase a digit from the second number, and hits the "-" before it finds a digit, that means its done, so it stops. Let's look at a simple execution trace. In the trace, I'll write the machine state, followed by a colon, followed by the tape contents surrounded by "[]", with the current tape cell surrounded by "{}".

	scanright:  [ {1}1111111-111= ]"
scanright:  [ 1{1}111111-111= ]"
scanright:  [ 11{1}11111-111= ]"
scanright:  [ 111{1}1111-111= ]"
scanright:  [ 1111{1}111-111= ]"
scanright:  [ 11111{1}11-111= ]"
scanright:  [ 111111{1}1-111= ]"
scanright:  [ 1111111{1}-111= ]"
scanright:  [ 11111111{-}111= ]"
scanright:  [ 11111111-{1}11= ]"
scanright:  [ 11111111-1{1}1= ]"
scanright:  [ 11111111-11{1}= ]"
scanright:  [ 11111111-111{=} ]"
eraseone :  [ 11111111-11{1}  ]"
subone   :  [ 11111111-1{1}=  ]"
subone   :  [ 11111111-{1}1=  ]"
subone   :  [ 11111111{-}11=  ]"
skip     :  [ 1111111{1}-11=  ]"
scanright:  [ 1111111 {-}11=  ]"
scanright:  [ 1111111 -{1}1=  ]"
scanright:  [ 1111111 -1{1}=  ]"
scanright:  [ 1111111 -11{=}  ]"
eraseone :  [ 1111111 -1{1}   ]"
subone   :  [ 1111111 -{1}=   ]"
subone   :  [ 1111111 {-}1=   ]"
skip     :  [ 1111111{ }-1=   ]"
skip     :  [ 111111{1} -1=   ]"
scanright:  [ 111111 { }-1=   ]"
scanright:  [ 111111  {-}1=   ]"
scanright:  [ 111111  -{1}=   ]"
scanright:  [ 111111  -1{=}   ]"
eraseone :  [ 111111  -{1}    ]"
subone   :  [ 111111  {-}=    ]"
skip     :  [ 111111 { }-=    ]"
skip     :  [ 111111{ } -=    ]"
skip     :  [ 11111{1}  -=    ]"
scanright:  [ 11111 { } -=    ]"
scanright:  [ 11111  { }-=    ]"
scanright:  [ 11111   {-}=    ]"
scanright:  [ 11111   -{=}    ]"
eraseone :  [ 11111   {-}     ]"
Halt:       [ 11111  { }-     ]"


See, it works!

One really important thing to understand here is that we do not have a program. What we just did was define a Turing machine that does subtraction. The machine does not take any instructions: the states and the transition function are an intrinsic part of the machine. So the only thing this machine can do is to subtract.

The real genius of Turing wasn't just defining this simple computing machine. It was realizing the concept of the program: you could design a Turing machine whose input tape contained a description of a Turing machine - that is a program - followed by an input to the program. This single machine - the Universal Turing machine - could simulate any other Turing machine; one machine which could perform any computation.

Turing machines are a lot of fun to play with. Figuring out how to do things can be fascinating. Figuring out how to define a Universal turing machine's transition function is an amazing thing to do; it's astonishing how simple the universal machine can be!

As I said earlier - you can't make a mechanical computing device that does anything that a Turing machine can't do. One of the beauties of the Turing machine is that it lets you explore that. You can try adding and removing features to the basic machine, and see what happens.

For example: if you can do lots of great stuff with a Turing machine with one tape, what if you had a two-tape turing machine? That is, take the basic turing machine, and say that it's got two tapes, each with a read/write head. Each state transition rule on this machine depends on the pair of values found on the two tapes. For now, we'll say that the tapes move together - that is, the transition rule says "move the heads right" or "move the heads left".

Seems like this should represent a real increase in power, right? No. Take a single-tape turing machine. Take the alphabet for tape one, and call it A1; take the alphabet for tape 2, and call it A2. We can create a single-tape turing machine whose alphabet is the cross-product of A1 and A2. Now each symbol on the tape is equivalent of a symbol on tape 1 and a symbol on tape 2. So we've got a single-tape machine which is equivalent to the two-tape machine. Bingo.

We can lift the restriction on the heads moving together, but it's a lot more work. A two-tape machine can do things a lot faster than a one-tape, and the simulation will necessarily adapt to that. But it's entirely doable. How about a two-dimensional tape? We can simulate that pretty easily with a two-tape machine, which means we can do it with a one-tape machine. For a two tape machine, what we do is map the two-D tape onto the one-D-tape, as seen in the diagram below - so that cell 0 on the one-D tape corresponds to cell (0,0) on the two tape; cell (0,1) on the two-D corresponds to cell 1 on the one-D; cell (1,1) on the 2-D is cell 2 on the 1-D; etc. Then we use the second tape for the book-keeping necessary to do the equivalent of T-D tape moves. And we've got a two-D turing machine simulated with a two-tape one-D; and we know that we can simulate a two-tape one-D with a one-tape one-D.

This is, to me, the most beautiful thing about the Turing machine. It's not just a fundamental theoretical construction of a computing device; it's a simple construction of a computing device that's really easy to experiment with. Consider, for a moment, lambda calculus. It's more useful that a Turing machine for lots of purposes - we write real programs in lambda calculus, when no one would build a real application using a Turing machine program. But imagine how you'd try things like the alternate constructions of the Turing machine? It's a whole lot harder to build experiments like those in lambda calculus. Likewise for Minsky machines, Markov machines, etc.

For your enjoyment, I've implemented a Turing machine programming language. You feed it a Turing machine description, and an input string, and it will give you a trace of the machines execution like the one above. Here's the specification of the subtraction machine written in my little Turing language:

states { "scanright" "eraseone" "subtractOneFromResult" "skipblanks" } initial "scanright"
alphabet { '1' ' ' '=' '-' } blank ' '
trans from "scanright" to "scanright" on (' ') write ' ' move right
trans from "scanright" to "scanright" on ('1') write '1' move right
trans from "scanright" to "scanright" on ('-') write '-' move right
trans from "scanright" to "eraseone" on ('=') write ' ' move left
trans from "eraseone" to "subtractOneFromResult" on ('1') write '=' move left
trans from "eraseone" to "Halt" on ('-') write ' ' move left
trans from "subtractOneFromResult" to "subtractOneFromResult" on ('1') write '1' move left
trans from "subtractOneFromResult" to "skipblanks" on ('-') write '-' move left
trans from "skipblanks" to "skipblanks" on (' ') write ' ' move left
trans from "skipblanks" to "scanright" on ('1') write ' ' move right


I think it's pretty clear as a syntax, but it still needs explanation.

• The first line declares the possible states of the machine, and what state it starts in. This machine has four possible states - "scanright", "eraseone", "subtractOneFromResult", and "skipblanks". When the machine starts, it will be in the state "skipright".
• The second line declares the set of symbols that can appear on the tape, including what symbol will initially appear on any tape cell whose value wasn't specified by the input. For this machine the symbols are the digit 1, a blank space, the equal sign, and the subtraction symbol; the blank symbol is on any tape cell whose initial value wasn't specified.
• After that is a series of transition declarations. Each declaration specifies what the machine will do for a given pair of initial state and tape symbol. So, for example, if the machine is in state "scanright", and the current tape cell contains the equal-to sign, then the machine will change to state "eraseone", write a blank onto the tape cell (erasing the "=" sign), and move the tape cell one position to the left.

Finally, the code. You'll need GHCI to run it; at the moment, it won't work in Hugs (I don't have the parsing library in my version of Hugs), and I haven't worked out the linkage stuff to make it work under the GHC compiled mode.

--
-- A Simple Turing machine interpreter
-- Copyright 2007 by Mark C. Chu-Carroll
--    markcc@gmail.com
--   http://scienceblogs.com/goodmath
--
-- You can do anything you want with this code so long as you
-- leave this copyright notice intact.
--
module Turing where
import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language
import System.Environment

data Motion = MoveLeft  | MoveRight deriving (Show)

-- Transition from to on move write
data Transition = Transition String String [Char] Motion  Char deriving (Show)

-- TuringMachine states initial alphabet blank transitions
data TuringMachine = Machine [String] String   [Char]    Char    [Transition] deriving (Show)

data MachineState = TMState [Char] Char [Char]  String  deriving (Show)
--                           tape-left curcell  curstate

getBlankSym :: TuringMachine -> Char
getBlankSym (Machine _ _ _ bl _) = bl

findMatchingTransition :: [Transition] -> String -> Char -> Maybe Transition
findMatchingTransition [] _ _ =  Nothing
findMatchingTransition translist state c =
let matches = filter (transitionMatches state c) translist
in case matches of
[] -> Nothing
t:[] -> Just t
_ -> error "Ambiguous transition"
where transitionMatches state c (Transition from to on move write) =
(from == state) && (elem c on)

runTransition :: TuringMachine -> [Char] -> Char -> [Char] -> String -> Transition -> MachineState
runTransition m (l:left) c right state (Transition from tostate on MoveLeft write) =
TMState left l (write:right) tostate
runTransition m left c [] state (Transition from tostate on MoveRight write) =
TMState (write:left) (getBlankSym m) [] tostate
runTransition m left c (r:right) state (Transition from tostate on MoveRight write) =
TMState (write:left) r right tostate

stepMachine :: TuringMachine -> MachineState -> MachineState
stepMachine machine@(Machine _ _ _ _ translist) st@(TMState tapeleft c taperight state) =
let trans = findMatchingTransition translist state c
in case trans of
(Just t) -> runTransition machine tapeleft c taperight state t
Nothing -> error "No applicable transition from state"

getMachineStateString (TMState left c right state) =
(state ++ ":[" ++ (reverse left) ++ "{" ++ [c] ++ "}" ++ right ++ "]")

runTracedMachine :: TuringMachine -> [Char] -> [String]
runTracedMachine tm@(Machine states initial alphabet blank transitions) (t:tape) =
runTracedMachineSteps tm (TMState [] t tape initial) where
runTracedMachineSteps machine state =
let newstate@(TMState left c right st) = stepMachine machine state
in if st == "Halt"
then [getMachineStateString newstate]
else let trace=runTracedMachineSteps machine newstate
in ((getMachineStateString newstate):trace)

runMachine :: TuringMachine -> [Char] -> [Char]
runMachine tm@(Machine states initial alphabet blank transitions) (t:tape) =
runMachineSteps tm (TMState [] t tape initial) where
runMachineSteps machine state =
let newstate@(TMState left c right st) = stepMachine machine state
in if st == "Halt"
then (concat [(reverse left), [c], right])
else runMachineSteps machine newstate

---------------------------------------------------------------------------
-- Parsing code - implemented using the Parsec monadic parser library.
---------------------------------------------------------------------------

-- Basic setup stuff - use a standard haskell style lexer; set up the reserved words
-- and symbols in the lexer.
lexer :: P.TokenParser ()
{ P.reservedNames = ["states","alphabet","trans","from","to","on","write","move","left","right","initial","blank"] })

reserved = P.reserved lexer
symbol = P.symbol lexer
braces = P.braces lexer
parens = P.parens lexer
charlit = P.charLiteral lexer
strlit = P.stringLiteral lexer
ident = P.identifier lexer
whitespace = P.whiteSpace lexer

states = reserved "states"
alphabet = reserved "alphabet"
trans = reserved "trans"
from = reserved "from"
to = reserved "to"
on = reserved "on"
write = reserved "write"
move = reserved "move"
initial = reserved "initial"
blank = reserved "blank"

left = do { reserved "left"
; return MoveLeft
}

right = do { reserved "right"
; return MoveRight
}

-- statesDecl ::= "states" "{"  strlit+  "}" "initial" strlit
statesDecl = do { states
; strlst <- braces (many1 strlit)
; initial
; i <- strlit
; return (i,strlst)
}

-- alphaDecl ::= "alphabet" "{" charlit+  "}" "blank" charlit
; charlst <- braces (many1 charlit)
; blank
; bl <- charlit
; return (bl, charlst)
}

-- transDecl ::= "trans" "from" strlit "to" strlit "on" "(" charlit+ ")" "write" charlit "move" ("left" | "right")
transDecl = do { trans
; from
; fromState <- strlit
; to
; targetState <- strlit
; on
; chrs <- parens (many1 charlit)
; write
; wrchar <- charlit
; move
; dir <- ( left  right)
; return (Transition fromState targetState chrs dir wrchar)
}

-- machine ::= statesDecl alphaDecl transDecl+
machine = do { (i,sts) <- statesDecl
; trans  Parser a -> String -> IO a
run p input
= case (parse p "" input) of
Left err -> do{ putStr "parse error at "
; print err
; error "Parse error"
}
Right x -> return x

runTParser ::  String -> IO TuringMachine
runTParser input =
run (do { whitespace
; x  String -> IO ([String])
runTracedMachineOnString m str =
do
tm  String -> IO String
runMachineOnString m str =
do
tm <- runTParser m
return (runMachine tm str)

sampleInput = " 11111111-111= "

------------------------------------------------------------------------
-- Main program execution scaffolding
-- main still needs a bit of work so that ghci will link correctly;
-- runs fine in GHCI, but linkage errors in GHC. For now, just load
-- this file, and then execute "runFromFile" from the command line.
------------------------------------------------------------------------
main =
do
[file] <- getArgs
m <- parseFromFile (do { whitespace
; x  do
print "Enter input for parser:"
s <- getLine
result  do
print (concat ["Parse error"])

runFromFile :: String -> IO ()
runFromFile filename =
do
m <- parseFromFile (do { whitespace
; x  do
print "Enter input for parser:"
s <- getLine
result  do
print (concat ["Parse error"])



## Categorical Computation Characterized By Closed Cartesian Categories

One of my favorite categorical structures is a thing called a closed cartesian category, or CCC for short. Since I'm a computer scientist/software engineer, it's a natural: CCCs are, basically, the categorical structure of lambda calculus - and thus, effectively, a categorical model of computation. However, before we can talk about the CCCs, we need - what else? - more definitions.

### Cartesian Categories

A cartesian category (note not cartesian closed category) is a category:

1. With a terminal object , and
2. , the objects and arrows of the categorical product .

So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.

### Categorical Exponentials

To get from cartesian categories to cartesian closed categories, we also need to define categorical exponentials. Like categorical product, the value of a categorical exponential is not required to included in a category. The exponential is a complicated definition, and it's a bit hard to really get your head around, but it's well worth the effort. If categorical products are the categorical generalization of set products, then the categorical exponential is the categorical version of a function space. It gives us the ability to talk about structures that are the generalized version of "all functions from A to B".

Given two objects x and y from a category C, their categorical exponential xy, if it exists in the category, is defined by a set of values:

• An object ,
• An arrow , called an evaluation map.
• , an operation . (That is, an operation mapping from arrows to arrows.)

These values must have the following properties:

1. :

To make that a bit easier to understand, let's turn it into a diagram.

As I alluded to earlier, you can also think of it as a generalization of a function space. is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.

So what does the categorical exponential mean? I think it's easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.

If X and Y are sets, then is the set of functions from Y to X.

Now, look at the diagram:

1. The top part says, basically, that is a function from to to : so takes a member of , and uses it to select a function from to .
2. The vertical arrow says:
1. given the pair , maps to a value in .
2. given a pair , we're going through a function. It's almost like currying:
1. The vertical arrow going down is basically taking , and currying it to .
2. Per the top part of the diagram, selects a function from to . (That is, a member of .)
3. So, at the end of the vertical arrow, we have a pair .
3. The "eval" arrow maps from the pair of a function and a value to the result of applying the function to the value.

### Cartesian Closed Categories

Now - the abstraction step is actually kind of easy: all we're doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.

With exponentials and products, we can finally say what the cartesian closed categories (CCCs). A Cartesian closed category is a category that is closed with respect to both products and exponentials.

Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The structure of the CCCs - with its closure WRT product and exponential - is an expression of the basic capability of an effective computing system. So next, we'll take a look at a couple of examples of what we can do with the CCCs as a categorical model of computation.

## What if it's not Regular? Pump it!

At this point, we've seen a fair bit about regular languages, and we've gone through the introduction to context free languages. We know one way of showing that a language is regular or context free: if you can write a (regular/context free) grammar for a language, then that language is necessarily (regular/context free). But... what if we have a language that we suspect is not regular?

## The Next Step in Computation: Level-2 Languages

Time to move on to Chomsky level 2! Level two languages are also known as context free languages, or CFLs. Level 2 languages are wonderful things. They're simple enough to be parsed easily, but expressive enough to support a very wide range of useful languages. Pretty much every programming language that's widely used can have its syntax specified with a level-2 language.

### Grammars for Context Free Languages

In terms of the grammar, a CFL is easy to describe: it's a language where the left-hand side of every grammar rule consists of exactly one non-terminal symbol. That's it: the right hand side of a rule in a CFL grammar can be anything at all. Unlike the regular grammars, there are no restrictions on the position or the number of NTSs on the right hand side.

This change makes a huge difference in what you can do. In a CFL, you can count. You can have distant relationships - things like a sub-string that can occurs at the end of a string only if a match for it occurs at the beginning. The canonical example of this is paren matching: you can write a language that makes sure that you have matching parens: the same number of open and close parens.

• A ::= '(' ')'
•  A ::= A A

This language includes ()((())()()(())), but not ()((())()()(()) (the same thing, but with one trailing paren omitted - 8 opens, 7 closes), or ()((())()()(()))) (the same thing, but with an extra close paren at the end - 8 opens, 9 closes).

As a quick aside: this also illustrates what we mean when we say that a language supports counting. When we say that a language requires counting, what we mean is that is that some feature of a string in the language has to have a number of repetitions dictated by another feature in the same string. In a paren-matching grammar, we require that the number of close parens must be the same as the number of open parens. We don't just make sure that that's true for 1, or 2, or 3 open parens, we have matching close parens. For any number of parens, the number of closes must be the same as the number of opens.

We can look at a much less trivial example of a simple grammar. As I've written about at other times, in computer science, there's a formal language that we use for a ton of valuable things called lambda calculus. Lambda calculus is the formal mathematical basis of the Haskell language, and the basic tool used for specifying formal semantics of computational systems. The complete grammar of the simply typed lambda calculus is:

• ident ::= "A" | "B" | "C" | ... | "Z"
• expr ::= "" ident "." expr
• expr ::= ident
• expr ::= expr expr
• expr ::= "(" expr ")"

You can see a practical example of counting in this grammar. It guarantees that expressions in the lambda calculus are well-formed. We couldn't do that in a regular language. That's a huge boost in capability.

So why do we call this a context free language? In terms of the grammar, when we're constructing a string, we're always doing it by replacing non-terminal symbols with sequences of other symbols. When we do one of those replacements, if there's an NTS "S" in the current string, then we can replace it. We can't look at anything but the individual non-terminals in the string. We can't consider the context in which a non-terminal occurs before we decide whether or not we're allowed to replace it.

### Capability of Context Free Languages

As we've seen, CFLs give us some pretty significant additional capabilities. That abilities to count and to describe non-consecutive relationships between different parts of a string in a language are a huge upgrade in computational capability. But CFLs are still pretty limited in many ways. You can't write a grammar for a spoken language using CFGs - natural languages aren't context free. We use CFLs and CFGs all the time for compilers and such in real programs, but there's always an extra step involved, because there are aspects of real computer languages that can't be captured in context-free form.

So what can't you do in a CFL? I'm not going to try to formally characterize the limits of CFLs, but here are two examples:

Complex counting/Arithmetic
if you want a language of strings with the same number of Xs and Ys, that language is a CFL. If you want a string with the same number of Xs, Ys, and Zs - that isn't.
Constrained relationships
We can have some distant relationships in a context grammar - but it's a very limited capability. You can't specify that a particular symbol can only occur in one place in the string if it already occured in a prior part of the string. For example, in the lamba calculus example, it really should say that you can only use the "expr ::= ident" rule if the ident occured in an enclosing LAMBA ident". CFLs can't do that.

### Computing CFLs: the PushDown Automaton

So - now that we know what a CFL is, and what it can express: what's
the basic computational model that underlies them? CFLs are computable using a kind of machine called a pushdown automaton (PDA). Pushdown automata are relatively simple: take a finite state machine, and add a stack.

For the non-CS folks out there, a stack is a last in first out (LIFO) storage system. What that means is that you can store something on the top of the stack whenever you want to (called pushing), look at what's on top of the stack (peeking), and removing the element on top (popping). For a PDA, the transitions look like:

• The top-of-stack in the transition can be either a symbol from the machine's alphabet, or it can be "*". If it's a symbol, then the transition can only be taken if both the machine state and the top-of-stack match. If it's "*", then the transition can be taken regardless of the value on top of the stack.
• The stack-action can be "push(symbol)"; "pop", or "none".
• The machine accepts the input if it reaches a final state with the stack empty. (There are alternate formulations that don't require an empty stack, or that only require an empty stack but don't use final states. They're exactly equivalent to empty stack + final state.)

As usual, it's easiest to understand this with an example. So let's look at a simple language consisting of parens and identifiers, where the parens have to match. So, for example, "((I)(((I)I)(II)))" would be accepted, but "(I)(((I)I)(II)))" (the same string, but with the first open removed) would be rejected.

Our machine has an alphabet of "(", ")", and "I". It has two states: 0, and 1. 0 is both the initial state, and the only final state. The available transitions are:

• : in state 0, if you see an open paren, push it onto the stack, and stay in state 0 It doesn't matter what's on the stack - if there's an open-paren in state 0, you can do this.
• : in state 0, if you see a close paren and there's an open-paren on top of the stack, then you've matched a pair, so you can pop the top open off of the stack.
• : if you're in state 0, and you see a close paren, but the stack in empty, then go to state one. State one is an error state: it means that you saw a close paren without a corresponding open paren. That's not allowed in this grammar. Once you're in state 1, you're stuck.
• : in state zero, if you see an "I", then you consume it and continue in state zero. It doesn't matter what's on the stack, and you don't change the stack.

Or graphically:

So - if it sees a "(", it pushes a "(" on the stack. If it sees an identifier, it just keeps going past it. If it sees a ")", and there's an "(" on top of the stack, it pops the stack; if it sees a ")" and there's no "(" on the stack, then it switches into state 1. Since state 1 is not a final state, and there is no transitions that can be taken from state 1, the machine rejects the string if it sees an extra ")". If there's a "(" without a matching close, then when the machine finishes, it will have a non-empty stack, and so it will reject the string.

Finally, one nifty little note. The pushdown automaton is a very limited kind of machine. It can't do complex arithmetic, or process complex grammatical constructions. There's a lot that it can't do. So what happens if we take this very limited machine, and give it a second stack?

It becomes maximally powerful - Turing complete. In fact, it becomes a Turing machine. We'll see more about Turing machines later, but a Turing machine is equivalent to a two-stack PDA, and a Turing
machine can perform any computation computable by any mechanized computing process. So one stack is extremely constrained; two stacks is as un-constrained as any computing device can ever be.

## Regular Expressions and Derivatives

When you're working with regular languages specified in regular expression form, there's a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It's called the Brzozozwksi derivative of a regular expression - or just simply the derivative of a regexp.

The basic idea of the derivative is that given a regular expression, , you can derive a new regular expression called the derivative with respect to symbol , . is a regular expression describing the string matched by after it's matched an .