Time to move on to Chomsky level 2! Level two languages are also known as context free languages, or CFLs. Level 2 languages are wonderful things. They're simple enough to be parsed easily, but expressive enough to support a very wide range of useful languages. Pretty much every programming language that's widely used can have its syntax specified with a level2 language.
Grammars for Context Free Languages
In terms of the grammar, a CFL is easy to describe: it's a language where the lefthand side of every grammar rule consists of exactly one nonterminal symbol. That's it: the right hand side of a rule in a CFL grammar can be anything at all. Unlike the regular grammars, there are no restrictions on the position or the number of NTSs on the right hand side.
This change makes a huge difference in what you can do. In a CFL, you can count. You can have distant relationships  things like a substring that can occurs at the end of a string only if a match for it occurs at the beginning. The canonical example of this is paren matching: you can write a language that makes sure that you have matching parens: the same number of open and close parens.

A ::= '(' ')'

A ::= A A
This language includes ()((())()()(()))
, but not ()((())()()(())
(the same thing, but with one trailing paren omitted  8 opens, 7 closes), or ()((())()()(())))
(the same thing, but with an extra close paren at the end  8 opens, 9 closes).
As a quick aside: this also illustrates what we mean when we say that a language supports counting. When we say that a language requires counting, what we mean is that is that some feature of a string in the language has to have a number of repetitions dictated by another feature in the same string. In a parenmatching grammar, we require that the number of close parens must be the same as the number of open parens. We don't just make sure that that's true for 1, or 2, or 3 open parens, we have matching close parens. For any number of parens, the number of closes must be the same as the number of opens.
We can look at a much less trivial example of a simple grammar. As I've written about at other times, in computer science, there's a formal language that we use for a ton of valuable things called lambda calculus. Lambda calculus is the formal mathematical basis of the Haskell language, and the basic tool used for specifying formal semantics of computational systems. The complete grammar of the simply typed lambda calculus is:

ident ::= "A"  "B"  "C"  ...  "Z"

expr ::= "\(lambda\)" ident "." expr

expr ::= ident

expr ::= expr expr

expr ::= "(" expr ")"
You can see a practical example of counting in this grammar. It guarantees that expressions in the lambda calculus are wellformed. We couldn't do that in a regular language. That's a huge boost in capability.
So why do we call this a context free language? In terms of the grammar, when we're constructing a string, we're always doing it by replacing nonterminal symbols with sequences of other symbols. When we do one of those replacements, if there's an NTS "S" in the current string, then we can replace it. We can't look at anything but the individual nonterminals in the string. We can't consider the context in which a nonterminal occurs before we decide whether or not we're allowed to replace it.
Capability of Context Free Languages
As we've seen, CFLs give us some pretty significant additional capabilities. That abilities to count and to describe nonconsecutive relationships between different parts of a string in a language are a huge upgrade in computational capability. But CFLs are still pretty limited in many ways. You can't write a grammar for a spoken language using CFGs  natural languages aren't context free. We use CFLs and CFGs all the time for compilers and such in real programs, but there's always an extra step involved, because there are aspects of real computer languages that can't be captured in contextfree form.
So what can't you do in a CFL? I'm not going to try to formally characterize the limits of CFLs, but here are two examples:
 Complex counting/Arithmetic
 if you want a language of strings with the same number of Xs and Ys, that language is a CFL. If you want a string with the same number of Xs, Ys, and Zs  that isn't.
 Constrained relationships
 We can have some distant relationships in a context grammar  but it's a very limited capability. You can't specify that a particular symbol can only occur in one place in the string if it already occured in a prior part of the string. For example, in the lamba calculus example, it really should say that you can only use the "expr ::= ident" rule if the ident occured in an enclosing LAMBA ident". CFLs can't do that.
Computing CFLs: the PushDown Automaton
So  now that we know what a CFL is, and what it can express: what's
the basic computational model that underlies them? CFLs are computable using a kind of machine called a pushdown automaton (PDA). Pushdown automata are relatively simple: take a finite state machine, and add a stack.
For the nonCS folks out there, a stack is a last in first out (LIFO) storage system. What that means is that you can store something on the top of the stack whenever you want to (called pushing), look at what's on top of the stack (peeking), and removing the element on top (popping). For a PDA, the transitions look like:
\[(mbox{state},mbox{topofstack},mbox{input}) rightarrow (mbox{state}, mbox{stackaction})\]
 The topofstack in the transition can be either a symbol from the machine's alphabet, or it can be "*". If it's a symbol, then the transition can only be taken if both the machine state and the topofstack match. If it's "*", then the transition can be taken regardless of the value on top of the stack.
 The stackaction can be "push(symbol)"; "pop", or "none".
 The machine accepts the input if it reaches a final state with the stack empty. (There are alternate formulations that don't require an empty stack, or that only require an empty stack but don't use final states. They're exactly equivalent to empty stack + final state.)
As usual, it's easiest to understand this with an example. So let's look at a simple language consisting of parens and identifiers, where the parens have to match. So, for example, "((I)(((I)I)(II)))" would be accepted, but "(I)(((I)I)(II)))" (the same string, but with the first open removed) would be rejected.
Our machine has an alphabet of "(", ")", and "I". It has two states: 0, and 1. 0 is both the initial state, and the only final state. The available transitions are:
 \((0, *, "(") rightarrow (0, push("("))\): in state 0, if you see an open paren, push it onto the stack, and stay in state 0 It doesn't matter what's on the stack  if there's an openparen in state 0, you can do this.
 \((0, "(", ")") rightarrow (0, pop)\): in state 0, if you see a close paren and there's an openparen on top of the stack, then you've matched a pair, so you can pop the top open off of the stack.
 \((0, epsilon, ")") rightarrow (1, _)\): if you're in state 0, and you see a close paren, but the stack in empty, then go to state one. State one is an error state: it means that you saw a close paren without a corresponding open paren. That's not allowed in this grammar. Once you're in state 1, you're stuck.
 \((0, *, "I") rightarrow (0, _)\): in state zero, if you see an "I", then you consume it and continue in state zero. It doesn't matter what's on the stack, and you don't change the stack.
Or graphically:
So  if it sees a "(", it pushes a "(" on the stack. If it sees an identifier, it just keeps going past it. If it sees a ")", and there's an "(" on top of the stack, it pops the stack; if it sees a ")" and there's no "(" on the stack, then it switches into state 1. Since state 1 is not a final state, and there is no transitions that can be taken from state 1, the machine rejects the string if it sees an extra ")". If there's a "(" without a matching close, then when the machine finishes, it will have a nonempty stack, and so it will reject the string.
Finally, one nifty little note. The pushdown automaton is a very limited kind of machine. It can't do complex arithmetic, or process complex grammatical constructions. There's a lot that it can't do. So what happens if we take this very limited machine, and give it a second stack?
It becomes maximally powerful  Turing complete. In fact, it becomes a Turing machine. We'll see more about Turing machines later, but a Turing machine is equivalent to a twostack PDA, and a Turing
machine can perform any computation computable by any mechanized computing process. So one stack is extremely constrained; two stacks is as unconstrained as any computing device can ever be.
"It becomes maximally powerful  Turing complete"
Aren't there more powerful systems of computing than Turing machines? I believe turing oracles (a turing machine with a source of random bits, for example) are considered more powerful. Quantum computers should be more powerful too.
Spencer: it's usually accepted as gospel that even quantum computers will not violate the ChurchTuring Thesis. The proof of CTT is classic computer science: three or four guys all tried to come up with the most general definition of "computable" and it turned out they were all equivalent, so that must mean it's the most general definition possible.
Turing machines are maximal for true computable systems.
A oracle machine is a turing machine whch can consult an external source of noncomputable information. It's not a true computable systm, because it can only work with access to something that it can't compute itself.
Quantum machines can't compute anything that isn't computable by a nonquantum system. In fact, a quantum machine is a strict subclass of a nondeterministic machine. The only thing that quantum adds is speed: a quantum machine can do some things faster than a standard machine.
Back when I was still a mathematician working in TQC, I never saw a proof of that last assertion that didn't basically depend on ChurchTuring.
You've left off that you have left the realm of the finite with the pushdown automata. There is no limit on the size of the stack. So infinity has come into the models at this point.
Well,, nothing requires you to keep the stack in the machine, it can be on external storage that you extend as desired, so really, we are no more infinite than our input and output are; but yes, that's now how pushdown automata are usually presented.
The CFL you gave appears to only accept strings like "()()()...".
Did you mean:
A ::= null
A ::= '(' A ')'
A ::= A A
or perhaps
A ::= null
A ::= '(' A ')' A
(which I think is unambiguous).
Are there any convenient tools for proving that a given language isn't a CFL? For regular languages we have the MyhillNerode theorem; is there a CFL equivalent?
The main tools that I know of are the pumping lemma and its variations, like Ogdon's(sp?).
I'll probably write something about that.
Some people argue that there should really be an extra level between regular and contextfree for languages that do have arbitrarily deep nesting (so they're not regular) but always make it explicit in their expressions (so they become easy to parse). Many languages (e.g. most XMLbased ones) fit this description. See http://www.cis.upenn.edu/~alur/nw.html for details.
Your picture for your PDA is a bit off. The "Push (" transition says it requires a ( on the top of the stack. (Your description above that doesn't.)
Mark, you might want to clarify that PDAs as you describe them here are deterministic PDAs, and in general the only thing you can say about CFLs is that they are recognizable by nondeterministic PDAs.
As an example of a CFL that a deterministic PDA can't recognize, take the language of all strings of the form $a^i b^j c^k d^l$ where either ( $i=l$ and $j=k$ ) or ( $i=j$ and $k=l$ ). It's fairly straightforward to write a CFG that generates this language, but it can't be parsed by a deterministic PDA.
Mark, you should also mention that your description above is of a deterministic PDA, and that in general all you can say about CFLs is that they're recognizable by nondeterministic PDAs.
(If you want an example of a CFL that's not recognizable by a deterministic PDA, just take all strings of the form "aaaaa...bbbbbb....ccccc..." where the number of "a"s is equal to either the number of "b"s or the number of "c"s. It's easy to write a CFG that generates that language, but a deterministic PDA can't recognize it)
Quickly implemented it, seems to be quite easy. http://codepad.org/aE0OQMNC
"So what can't you do in a CFL? I'm not going to try to formally characterize the limits of CFLs, but here are two examples:
(...)
Constrained relationships
We can have some distant relationships in a context grammar  but it's a very limited capability. You can't specify that a particular symbol can only occur in one place in the string if it already occured in a prior part of the string. For example, in the lamba calculus example, it really should say that you can only use the "expr ::= ident" rule if the ident occured in an enclosing LAMBA ident". CFLs can't do that. "
Of course they can! Just make another copy of "expr", and use "expr2 ::= ident":
ident ::= "A"  "B"  "C"  ...  "Z"
expr ::= "lambda" ident "." expr2
expr ::= expr expr
expr ::= "(" expr ")"
expr2 ::= "lambda" ident "." expr2
expr2 ::= expr2 expr2
expr2 ::= "(" expr2 ")"
expr2 ::= ident
That is not very straightfoward, though.