In the last post, I mentioned the fact that regular expressions specify the same set of languages as regular grammars, and that finite state machines are the computational device that can recognize those languages.

It's even pretty easy to describe how to convert from regular expressions to FSMs. But before we do that, to make it a bit easier, we'll extend our finite state machines. Doing that is interesting in itself: What we're going to do is create *non-deterministic* finite state machines - NFA (for nondeterministic finite automata) for short.

NFA are interesting things. They're a variation on the deterministic state machines (DFA) that we looked at before. An NFA is pretty similar to the DFA, except that the machine is allowed to make *choices*.

In the formulation that I'm using, there are two key differences between DFAs and NFAs:

- Multiple Transitions/Choices
- There can be more than one transition from a given state on a given input symbol.
- Empty transitions
- Two states in an NFA can be connected by an edge labelled \(epsilon\). This is called a
*null transition*.

The non-deterministic machine runs in pretty much the same way as a deterministic machine *except*:

- If there is a null transition from a state, then a machine in that state can take the transition any time that they want, without consuming any inputs.
- If there is more than one transition from a state on a particular machine, the machine can choose any of the available transitions.

A non-deterministic machine accepts an input string if there is *any* possible way of running the machine on a string that ends in a final state. For example:

This machine contains one transition labelled with an \(epsilon\), meaning that the transition can be taken from \(S_0\) without consuming any input.

So - the interesting question that NFAs bring up is: does changing the machine this way do anything? Does it give us any more power? Unfortunately, no. Given a non-deterministic FSM, we can easily convert it to a deterministic FSM.

- The DFA will have one state For each member of the
*powerset*of the states of the NFA. - The initial state of the DFA is the same as the initial state of the NFA.
- A state in the DFA is final if any of its members is a final state in the NFA.
- For each state \(S={ s_1, ..., s_n }\) and each input symbol \(k\), there is a transition to state \(T={ t_1, ..., t_n}\) if there are sequences of empty transitions surrounding a single non-empty transition for \(k\) from members of \(S\) to members of \(T\) in the NFA.

Basically, what that means is that you can create a DFA which superimposes every possible state that the NFA could be in. After any input string \(s\) has been consumed, the state of the DFA is the set of states that the NFA could have been in after seeing that string.

So, for example, the NFA shown above can be converted into the following DFA:

We start with \(S_0\). From \(S_0\), if we see an "a", there are three things that could happen. Either we'd take the transition to \(S_2\), or we'd take the empty transition to \(S_1\), and then take the transition from \(S_1\) to \(S_1\), or we'd take the empty transition to \(S_1\), and then take the a-transition to \(S_f\). So in the DFA, we'd have an a-transition from \({S_0}\) to \({S_1, S_2, S_f }\).

If we saw a \(b\), there are no transitions shown. By convention, we assume some "invisible" states. Any time that there's no transition listed for a given input symbol from a state, we assume that there's a transition from the state to "error". But we can use that assumption in the DFA as well as the NFA - so we just won't show the error state or transitions - that makes our machines much easier to read.

Analyzing state \(S_0\) has given us state \({S_1, S_2, S_f }\). If we see an input of a, that could bring us to \(S_1\) or \(S_f\) (if the NFA was in state \(S_1\)); or it could bring us to state \(S_f\) (if the NFA was in state \(S_2\).) So for an input of a, we'd put a transition to state \({ S_1, S_f}\). If we saw an input of "b", we could either go from \(S_2\) to \(S_2\), or from \(S_2\) to \(S_f\). So for b, we'd have a transition to \({ S_2, S_f }\).

And so on - the final result being:

What does this *mean?*

Well, a NFA isn't more powerful than a deterministic machine (a *DFA*)- there's nothing that an NFA can do that a DFA can't. But the NFA can do it with a *lot* fewer states. The DFA is a lot more complicated; it can have exponentially more states. But for an NFA, there's a corresponding DFA that can recognize the same language.

Most importantly... it means that with a fixed, finite amount of state, non-determinism doesn't buy you any computational power. With a finite state machine, non-determinism can't doesn't even make it possible to do anything faster! The only benefit that non-determinism has at this level is space: the amount of state that you need (that is, the number of states in the machine) can be smaller with a non-deterministic machine.

And every real physical computer in the world is a FA at heart ... so I guess these are pretty nice machines. I always feel funny when people say things about FA's not recognizing pairs of parentheses and such. No other model does either for real physical computers. This is the end for everything we can do as humans and anything we know how to ever build.

Stack Machines, Turing Machines, context free grammars, and such all don't have any real physical examples, just finite models. So I always felt a little sorry for FA's cause we always used them as the weak sibling in computational theory, yet they are really all we work with ðŸ™‚

I've always found that argument about FSMs versus other computing devices to be rather silly.

In theory, the computer that I'm typing on is a finite state machine.

But... if it's a finite state machine, it's a finite state machine with something considerably more than 2^{16000000000}states, and with a corresponding larger set of transitions.And in terms of the way that the machine behaves... it's makes a

lotmore sense to model a real computer as a von Neumann machine, even though the vNM has an unbounded amount of storage, and the real computer has bounded finite storage.And these days, with on-the-fly extensible storage, it makes more sense than ever.

Anyone who disrespects the noble FSM [1] has never tried to design complex hardware. Down at the RTL level, the state space gets manageable and the cost of errors gets astronomical. Formal methods may be severely limited, but I've seen them catch too many gotchas to have anything but love for them.

[1] No, not that one. Noodly appendages are trouble waiting to bite you.

A somewhat trivial point, but both NFAs and DFAs can be extended with an arbitrary number of states so it is not teribly interesring to say that "[the DFA] can have exponentially more states [than an equivalent NFA]". This can be done in a number of ways such as adding states that can never reach an accepting state, or adding loop unrollings.

Of significantly more interest is the statement that "sometimes the DFA must have exponentially more states than an equivalent NFA". Consider the language of sequences of 0 or 1's followed by a 0 followed by exactly k 0 or 1's for a fixed natural number k. This language has a k + 1 state NFA, and the smallest DFA for the language has 2^k states. The intuition is that the DFA must "remember" the last k inputs while the NFA can guess which 0 is k away from the end. (This example was shamelessly stolen from Sipser's book.)

So when are we going to get to quantum finite state machines?

There are two detail worth noting for NFAs. First, sometimes when trying to construct a DFA that recognizes some language it is conceptually easier to write down an NFA. Second, explicit versions of the pumping lemma for DFAs are easier to write down than for NFAs.