As promised, I'm finally going to get to the theory behind monads. As a quick review, the basic idea of the monad in Haskell is a hidden transition function  a monad is, basically, a state transition function.
The theory of monads comes from category theory. I'm going to assume you know a little bit about category theory  if you have trouble with it, go take a look at my introductory posts here.
Fundamentally, in category theory a monad is a category with a particular kind of structure. It's a category with one object. That category has a collection of arrows which (obviously) are from the single object to itself. That oneobject category has a functor from the category to itself. (As a reminder, a functoris an arrow between categories in the category of (small) categories.)
The first trick to the monad, in terms of theory, is that it's fundamentally about the functor: since the functor maps from a category to the same category, so you can almost ignore the category; it's implicit in the definition of the functor. So we can almost treat the monad as if it were just the functor  that is, a kind of transition function.
The other big trick is closely related to that. For the programming language application of monads, we can think of the single object in the category as the set of all possible states. So we have a category object, which is essentially the collection of all possible states; and there are arrows between the states representing possible state transitions. So the monad's functor is really just a mapping from arrows to different arrows  which basically represents the way that changing the state causes a change in the possible transitions to other states.
So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that's translated into programming languages, that becomes a value that implicitly takes an input state, possibly updates it, and returns an output state. Sound familiar?
Let's take a moment and get formal. As usual for category theory, first there are some preliminary definitions.
 Given a category, C, 1_{C} is the identity functor from C to C.
 Given a category C with a functor T : C → C, T^{2} = T º T.
 Given a functor T, 1_{T} : T → T is the natural transformation from T to T.
Now, with that out of the way, we can give the complete formal definition of a monad. Given a category C, a monad on C is a triple: (T:C→C, η:1_{C}→T, μ:T^{2} → T), where T is a functor, and η and μ are natural transformations. The members of the triple must make the following two diagrams commute.
Commutativity of composition with μ
Commutativity of composition with η
What these two diagrams mean is that successive applications of the statetransition functor over C behave associatively  that any sequence of composing monadic functors will result in a functor with full monadic structure; and that the monadic structure will always preserve. Together, these mean that any sequence of operations (that is, applications of the monad functor) are themselves monad functors  so the building a sequence of monadic state transformers is guaranteed to behave as a proper monadic state transition  so what happens inside of the monadic functor is fine  to the rest of the universe, the difference between a sequence and a single simple operation is indistinguishable: the state will be consistently passed from application to application with the correct chaining behavior, and to the outside world, the entire monadic chain looks like like a single atomic monadic operation.
Now, what does this mean in terms of programming? Each element of a monadic sequence in Haskell is an instantiation of the monadic functor  that is, it's an arrow between states  a function, not a simple value  which is the basic trick to monads. They look like a sequence of statements; in fact, each statement in a monad is actually a function from state to state. And it looks like we're writing sequence code  when what we're actually doing is writing function compositions  so that when we're done writing a monadic sequence, what we've actually done is written a function definition in terms of a sequence of function compositions.
Understanding that, we can now clearly understand why we need the return function to use a nonmonad expression inside of a monadic sequence  because each step in the sequence needs to be an instance of the monadic functor; an expression that isn't an instance of the monadic functor couldn't be composed with the functions in the sequence. The return function is really nothing but a function that combines a nonmonadic expression with the id functor.
In light of this, let's go back and look at the definition of Monad
in the Haskell standard prelude.
class Functor f where fmap :: (a > b) > f a > f b class Monad m where (>>=) :: m a > (a > m b) > m b (>>) :: m a > m b > m b return :: a > m a fail :: String > m a  Minimal complete definition:  (>>=), return m >> k = m >>= _ > k fail s = error s
The declaration of monad is connected with the definition of functor  if you look, you can see the connection. The fundamental operation of Monad
is ">>=
"  the chaining operation, which is type m a > (a > m b) > m b
is deeply connected with the fmap
operation from Functor
's fmap
operation  the a
in m a
is generally going to be a type which can be a Functor
.
So the value type wrapped in the monad is a functor  in fact, the functor from the category definition! And the ">>=
" operation is just the functor composition operation from the monad definition.
A proper implementation of a monad needs to follow some fundamental rules  the rules are, basically, just Haskell translations of the structurepreserving rules about functors and natural transformations in the categorytheoretic monad. There are two groups of laws  laws about the Functor
class, which should hold for the transition function wrapped in the monad class; and laws about the monadic operations in the Monad
class. One important thing to realize about the functor and monad laws is that they are not enforced  in fact, cannot be enforced!  but monadbased code using monad implementations that do not follow them may not work correctly. (A compiletime method for correctly verifying the enforcement of these rules can be shown to be equivalent to the halting problem.)
There are two simple laws for Functor
, and it's pretty obvious why they're fundamentally just strucurepreservation requirements. The functor class only has one operation, called fmap
, and the two functor laws are about how it must behave.

fmap id = id
(Mapping ID over any structured sequence results in an unmodified sequence) 
fmap (f . g) = (fmap f) . (fmap g)
("." is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)
The monad laws are a bit harder, but not much. The mainly govern how monadic operations interact with nonmonadic operations in terms of the "return
" and ">>=
" operations of the Monad
class.

return x >>= f = f x
(injecting a value into the monad is basically the same as passing it as a parameter down the chain  return is really just the identity functor passing its result on to the next step.) 
f >>= return = f
(If you don't specify a value for a return, it's the same as just returning the result of the previous step in the sequence  again, return is just identity, so passing something into return shouldn't affect it.) 
seq >>= return . f = fmap f seq
(composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad  in other words, it's just composition with identity.) 
seq >>= (x > f x >>= g) = (seq >>= f) >>= g
(Your implementation of ">>=
" needs to be semantically equivalent to the usual translation; that is, it must behave like a functor composition.)
Oh, thanks for this post! I've been reading a lot about comonads the last week or so, and while I can see how easy the monad laws are, and how I can use them to functionally program certain things, I could not understand the motivation to invent the monad in the first place. The fact you stated that a sequence of monadic operations is itself a monadic operation somehow made things click! I had not seen it put quite like that before.
A compiletime method for correctly verifying the enforcement of these rules can be shown to be equivalent to the halting problem.
Erm, by this I assume you mean that there is no recursive procedure that can prove a given set of ostensibly monadic operations in fact form a monad? Because surely a monadic library writer could provide his own computercheckable validity proof for his set of operations.
I love your blog!
Just a couple of comments.
First off, the >>= operator is usually pronounced "bind". This is just a disclaimer in case I call it "bind" in what follows.
Secondly, you can connect the category theory definition and the Haskell definition if you introduce the join operation:
join :: (Monad m) => m (m a) > m a
join m = m >>= return
A bit of calculation shows that you can represent bind:
m >>= k = join (fmap k m)
"Join" is sometimes easier to reason about than "bind". For example, "join" on linked lists concatenates a listoflists into a list.
Now look at the category theory definition again: (T:C→C, η:1C→T, μ:T2 → T)
T is a functor. That's "fmap".
η is a natural transformation from 1C to T. That is, for all objects A, it is an arrow from 1(A) (which is just A) to T(A), which satisfies an additional mapping property that I'll get to in a moment. This is just return, which has the type a > m a.
μ is a natural transformation from T2 to T. That is, for all objects A, it is an arrow from T (T A) to T A, with an additional mapping property. Compare this with join, with type m (m a) > m a.
The additional mapping property for a natural transformation is, basically, that natural transformations respect the structure of a functor. That is, if ν:F→G is a natural transformation, then for all arrows f, G(f)⋅ν = ν⋅F(f)
For join, this means that for all f:
fmap f . join = join . fmap (fmap f)
Here we use the fact that T2 applied to f is T (T f).
And for return, this means that for all f:
fmap f . return = return . f
Here, we use the fact that 1C applied to f is always f; it's the identity, after all.
Similarly, we can express commutativity of composition with μ as (assuming the types are correct):
join . join = join . fmap join
And commutativity of composition with η as:
join . fmap return = join . return = id
You will recall that id :: a > a is the identity function in haskell.
Once you've got this machinery, it's not difficult to see where the monad laws come from. The laws for fmap, as Mark says, are just the category theory definition of a functor. But example, for example, a law like this:
f >>= return = f
Expanding bind:
join (fmap return f) = f
This is true for all f, so:
join . fmap return = id
Which you will recognise as one of the category theory properties above.
Most interestingly of all is that there's a deep connection between polymorphism and natural transformations. Consider the type of return for some monad:
return :: a > M a
This is true for all possible types that can be used in place of a. It's true for types like functions, for example, which can't be tested for equality. It's even true for types like void, which don't have any "elements" in it at all. That means that "return" can't really look inside its argument; all it can do is pass it along.
So return must, in a sense, be completely oblivious to the types that it operates on. If A and B are types, then return :: A > M A should behave exactly like return :: B > M B, regardless of what A and B are. More formally, return should commute with any substitution of type B for type A.
We can express this in Haskell. If you have a function f :: A > B for some types A and B, then return should commute with f:
return . f = fmap f . return
Does this look familiar? We're just saying that return is a natural transformation. Remarkably, the same argument also holds true of join; the fact that join is polymorphic means that it must be a natural transformation. And similar arguments hold for any polymorphic function in Haskell.
This remarkable property of polymorphic functions known as the parametricity theorem. It states that a type which uses parametric polymorphism can be mechanically turned into a theorem, such that any function with that type must satisfy the associated theorem. In other words, a function, by virtue of being welltyped, has a theorem which you don't have to prove. And often (as is the case with join and return), these "free theorems" are actually quite useful.
The gory details are mentioned in Phil Wadler's classic paper with the wonderful title "Theorems for Free!"
Now a few nits: The "remarkable" property of, say, return, is actually only true in a language without the fixpoint operator and without facilities for forcing strict evaluation, both of which Haskell has. Recent research has shown that there are still "free theorems" even in the presence of these operations, but they're not nearly so succinct. For most wellbehaved Haskell programs, however, the free theorems are still true.
It may seem that all I do here is nitpick, but:
I think you confused monads with monoids in that paragraph. Certainly the category C defined further below does not need to have just one object.
Although someone said the word monad is from monoid + triad, so I would like to understand what monads really have to do with monoids. Are they maybe the 2categorical generalization, replacing objects with categories? After all the definition does have just one category in it.
Ørjan: You're right to nitpick.
First off, you're right that a monad is not "a category with one object" (plus structure). A monad over a category C can support lots of objects in C.
The obvious example is Haskell itself. You can think of Haskell as a category whose objects are types and whose morphisms are functions. The category Haskell clearly has more than one object, but as you can see, monads are fairly straightforward to define.
A monad is a generalisation of a monoid. You can think of a monoid as a category with a single object, but you can also think of it as an algebra over some carrier.
Think of it this way: A monoid is a triple (A,⊗,e) consisting of:
a carrier set, A,
a binary associative operation, ⊗, and
a specified left and right identity for ⊗, called e.
You can describe this as the set A plus two functions, μ:A×A→A and η:1→A, where 1 is the singleton set (or the terminal object in the category Set), with the following laws satisfied, for all x, y, z ∈ A and for all e ∈ 1:
μ(x,η(e)) = μ(η(e),x) = x
μ(x,μ(y,z)) = μ(μ(x,y),z)
Of course, in category theory, we don't use elements, so instead we say (using ⋅ as category composition and ≡ to denote equality up to isomorphism):
μ ⋅ idA×η ≡ μ ⋅ η×idA ≡ idA
μ ⋅ idA×μ = μ ⋅ μ×idA
If you draw a diagram (sorry, don't know how to do it in comments), the comparison with monads should be obvious. But for completeness, in Haskell, you can see that this looks a bit like an associative law:
(p >>= q) >>= r = p >>= (λx. q x >>= r)
And this looks like an identity law:
m >>= return = return a >>= λx. m = m
So in a sense, bind is associative and return is its identity. Just like a monoid.
Ah yes. I remember now the curious fact that you can generalize the usual proofs that identity in a monoid is unique if it exists to a proof that return is determined from bind:
If
return1 x >>= f = f x
for allf
andx
, thenreturn1 x = return1 x >>= return = return x
Alternatively, if
x >>= return1 = x
for allx
, thenreturn x = return x >>= return1 = return1 x
Unfortunately, like with monoids this does not seem to turn into an actual expression for return, so you still need to supply both in the instance definition.
Actually my last comment and the associative law for monads are probably better summed up in the Kleisli category whose arrows/morphisms are functions of type a > T b, identity arrows are return and which has composition
(f >>> g) x = f x >>= g
(For Haskell, see the Kleisli type in the Control.Arrow module.)
Off topic, I'll try once again to ask if there is a way to list more than the last ten recent comments in the blog without checking each article manually? Although the fact noone noticed the last time I asked probably means there isn't.
On what monads and monoids have to do with each other, there's a nice formal way to see that they're the "same thing". A monoid is a monoidobject in Set (with tensor being Cartesian product), and a monad over a category C is a monoidobject in the category of endofunctors on C (with tensor being composition of functors).
For the details, see http://en.wikipedia.org/wiki/Monoid_(category_theory) and http://en.wikipedia.org/wiki/Monoidal_category
This was the best "connect the dots" style post on haskell and category theory I've read  thanks.
The "datahiding" ( or objecthiding) aspect
of monads makes me think of Leibniz's "windowless monads".