A few of my recent posts here appear to have struck some nerves, and I've been
getting lots of annoying email containing the same questions, over and over again. So rather than reply individually, I'm going to answer them here in the hope that either (a) people will see the answers before send the question to me, and therefore not bother me; or (b) conclude that I'm an obnoxious asshole who isn't worth the trouble of writining to, and therefore not bother me. I suspect that (b) is more likely than (a), but hey, whatever works.
Answers beneath the fold.
With continuity under our belts (albeit with some bumps along the way), we can look at something that many people consider *the* central concept of topology: homeomorphisms.
A homeomorphism is what defines the topological concept of *equivalence*. Remember the clay mug/torus metaphor from from my introduction: in topology, two topological spaces are equivalent if they can be bent, stretched, smushed, twisted, or glued to form the same shape *without* tearing.
The rest is beneath the fold.
*(Note: in the original version of this, I made an absolutely **huge** error. One of my faults in discussing topology is scrambling when to use forward functions, and when to use inverse functions. Continuity is dependent on properties defined in terms of the *inverse* of the function; I originally wrote it in the other direction. Thanks to commenter Dave Glasser for pointing out my error. I'll try to be more careful in the future!)*
Since I'm back, it's time to get back to topology!
I'm going to spend a bit more time talking about what continuity means; it's a really important concept in topology, and I don't think I did a particularly good job at explaining it in my first attempt.
Continuity is a concept of a certain kind of *smoothness*. In non-topological mathematics, we define continuity with a very straightforward algebraic idea of smoothness. A standard intuitive definition of a *continuous function* in algebra is "a function whose graph can be drawn without lifting your pencil". The topological idea of continuity is very much the same kind of thing - but since a topological space is just a set with some additional structure, the definition of continuity has to be generalized to the structure of topologies.
The closest we can get to the algebraic intuition is to talk about *neighborhoods*. We'll define them more precisely in a moment, but first we'll just talk intuitively. Neighborhoods only exist in topological metric spaces, since they end up being defined in terms of distance; but they'll give us the intuition that we can build on.
Let's look at two topological spaces, **S** and **T**, and a function f : **S** → **T** (that is, a function from *points* in **S** to *points* in **T**). What does it mean for f to be continuous? What does *smoothness* mean in this context?
Suppose we've got a point, *s* ∈ **S**. Then f(*s*) ∈ **T**. If f is continuous, then for any point p in **T** *close to f(s)*, f-1(p) will be *close to* *s*. What does close to mean? Pick any distance - any *neighborhood* N(f(s)) in **T** - no matter how small; there will be a corresponding neighborhood of M(*s*) around s in **S** so that for all points p in N(f(s)), f-1 will be in M(*s*). If that's a bit hard to follow, a diagram might help:
To be a bit more precise: let's define a neighborhood. A neighborhood N(p) of a point p is a set of points that are *close to* p. We'll leave the precise definition of *close to* open, but you can think of it as being within a real-number distance in a metric space. (*close to* for the sake of continuity is definable for any topological space, but it can be a strange concept of close to.)
The function f is continuous if and only if for all points f(s) ∈ **T**, for all neighborhoods N(f(s)) of f(s), there is some neighborhood M(s) in **S** so that f(M(s)) ⊆ N(f(s)). Note that this is for *all* neighborhoods of *all* points in **T** mapped to by f - so no matter how small you shrink the neighborhood around f(s), the property holds - and it implies that as the neighborhood in **T** shrinks, so does the corresponding neighborhood in **S**, until you reach the single points f(s) and s.
Why does this imply *smoothness*? It means that you can't find a set of points in the range of f in **T** that are close together, but that weren't close together in **S** before being mapped by f. f won't put things together that weren't together originally. And it won't pull things apart that weren't
close together originally. *(This paragraph was corrected to be more clear based on comments from Daniel Martin.)*
For a neat exercise: go back to the category theory articles, where we defined *initial* and *final* objects in a category. There are corresponding notions of *initial* and *final* topologies in a topological space for a set. The definitions are basically the same as in category theory - the arrows from the initial object are the *continuous functions* from the topological space, etc.
While I was on vacation, I got some email from Chris Noble pointing me towards a discussion with some thoroughly innumerate HIV-AIDS denialists. It's really quite shocking what passes for a reasonable argument among true believers.
The initial stupid statement is from one of Duesberg's papers, [AIDS Acquired by Drug Consumption and Other Noncontagious Risk Factors][duesberg], and it's quite a whopper. During a discussion of the infection rates shown by HIV tests of military recruits, he says:
>(a) "AIDS tests" from applicants to the U.S. Army and the U.S. Job
>Corps indicate that between 0.03% (Burke et al.,1990) and 0.3% (St
>Louis et al.,1991) of the 17- to 19-year-old applicants are HIV-infected
>but healthy. Since there are about 90 million Americans under the age
>of 20, there must be between 27,000 and 270,000(0.03%-0.3% of 90
>million) HIV carriers. In Central Africa there are even more, since 1-2%
>of healthy children are HIV-positive (Quinn et al.,1986).
>Most, if not all, of these adolescents must have acquired HIV from
>perinatal infection for the following reasons: sexual transmission of
>HIV depends on an average of 1000 sexual contacts, and only 1in 250
>Americans carries HIV (Table 1). Thus, all positive teenagers would
>have had to achieve an absurd 1000 contacts with a positive partner, or
>an even more absurd 250,000 sexual contacts with random Americans
>to acquire HIV by sexual transmission. It follows that probably all of
>the healthy adolescent HIV carriers were perinatally infected, as for
>example the 22-year-old Kimberly Bergalis (Section 3.5.16).
Now, I would think that *anyone* who reads an allegedly scientific paper like this would be capable of seeing the spectacular stupidity in this quotation. But for the sake of pedantry, I'll explain it using small words.
If the odds of, say, winning the lottery are 1 in 1 million, that does *not* mean that if I won the lottery, that means I must have played it one million times. Nor does it mean that the average lottery winner played the lottery one million times. It means that out of every one million times *anyone* plays the lottery, *one* person will be expected to win.
To jump that back to Duesberg, what he's saying is: if the transmission rate of HIV/AIDS is 1 in 1000, then the average infected person would need to have had sex with an infected partner 1000 times.
Nope, that's not how math works. Not even close.
Suppose we have 1000 people who are infected with HIV, and who are having unprotected sex. *If* we follow Duesberg's lead, and assume that the transmission rate is a constant 0.1%, then what we would expect is that if each of those 1000 people had sex with one partner one time, we would see one new infected individual - and that individual would have had unprotected sex with the infected partner only one time.
This isn't rocket science folks. This is damned simple, high-school level statistics.
Where things get even sadder is looking at the discussion that followed when Chris posted something similar to the above explanation. Some of the ridiculous contortions that people go through in order to avoid admitting that the great Peter Duesberg said something stupid is just astounding. For example, consider [this][truthseeker] from a poster calling himself "Truthseeker":
>If Duesberg had said that, he would indeed be foolish. The foolishness,
>however, is yours, since you misintepret his reasoning. He said, as you note
>>Most, if not all, of these adolescents must have acquired HIV from perinatal
>>infection for the following reasons: sexual transmission of HIV depends on an
>>average of 1000 sexual contacts, and only 1 in 250 Americans carries HIV
>>(Table 1). Thus, all positive teenagers would have had to achieve an absurd
>>1000 contacts with a positive partner, or an even more absurd 250,000 sexual
>>contacts with random Americans to acquire HIV by sexual transmission.
>This states the average transmission requires 1000 contacts, not every
>transmission. With such a low transmission rate and with so few Americans
>positive - you have to engage with 250 partners on average to get an average
>certainty of 100% for transmission, if the transmission rate was 1. Since it is
>1 in 1000, the number you have to get through on average is 250,000. Some might
>do it immediately, some might fail entirely even at 250,000. But the average
>indicates that all positive teenagers would have had to get through on average
Truthseeker is making exactly the same mistake as Duesberg. The difference is that he's just had it explained to him using a simple metaphor, and he's trying to spin a way around the fact that *Duesberg screwed up*.
But it gets even worse. A poster named Claus responded with [this][claus] indignant response to Chris's use of a metaphor about plane crashes:
>You would fare so much better if you could just stay with the science
>points and refrain from your ad Duesbergs for more than 2 sentences at
>a time. You know there's a proverb where I come from that says 'thief thinks
>every man steals'. I've never seen anybody persisting the way you do in
>calling other people 'liars', 'dishonest' and the likes in spite of the
>fact that the only one shown to be repeatedly and wilfully dishonest
>here is you.
>Unlike yourself Duesberg doesn't deal with matters on a case-by-case only basis
>in order to illustrate his statistical points. precisely as TS says, this shows
>that you're the one who's not doing the statistics, only the misleading.
>In statistics, for an illustration to have any meaning, one must assume that
>it's representative of an in the context significant statistical average no?
>Or perphaps in CN's estimed opinion statistics is all about that once in a
>while when somebody does win in the lottery?
Gotta interject here... Yeah, statistics *is* about that once in a while when someone wins the lottery, or when someone catches HIV, or when someone dies in a plane crash. It's about measuring things by looking at aggregate numbers for a population. *Any* unlikely event follows the same pattern, whether it's catching HIV, winning the lottery, or dying in a plane crash, and that's one of the things that statistics is specifically designed to talk about: that fundamental probabilistic pattern.
>But never mind we'll let CN have the point; the case in question was that odd
>one out, and Duesberg was guilty of the gambler's fallacy. ok? You scored one
>on Duesberg, happy now? Good. So here's the real statistical point abstracted,
>if you will, from the whole that's made up by all single cases, then applied to
>the single case in question:
>>Thus, all positive teenagers would have had to achieve an absurd 1000 contacts
>>with a positive partner, or an even more absurd 250,000 sexual contacts with
>>random Americans to acquire HIV by sexual transmission.
>This is the statistical truth, which is what everybody but CN is interested in.
Nope, this is *not* statistical truth. This is an elementary statistical error which even a moron should be able to recognize.
>Reminder: Whenever somebody shows a pattern of pedantically reverting to single
>cases and/or persons, insisting on interpreting them out of all context, it's
>because they want to divert your attention from real issues and blind you to
>the overall picture.
Reminder: whenever someone shows a pattern of pedantically reverting to a single statistic, insisting on interpreting it in an entirely invalid context, it's because they want to divert your attention from real issues and blind you to the overall picture.
The 250,000 average sexual contacts is a classic big-numbers thing: it's so valuable to be able to come up with an absurd number that people will immediately reject, and assign it to your opponents argument. They *can't* let this go, no matter how stupid it is, no matter how obviously wrong. Because it's so important to them to be able to say "According to *their own statistics*, the HIV believers are saying that the average teenage army recruit has had sex 250,000 times!". As long as they can keep up the *pretense* of a debate around the validity of that statistic, they can keep on using it. So no matter how stupid, they'll keep defending the line.
While I was away on vacation, my family made a stop in Corning NY to see the Corning Glass Museum. I had to snap this photo for PZ. Alas, all I had was the camera in my cellphone, so the resolution leaves something to be desired, but it's the thought that counts, right?
Lambda calculus started off with the simple, untyped lambda calculus that we've been talking about so far. But one of the great open questions about lambda calculus was: was it sound? Did it have a valid model?
Church found that it was easy to produce some strange and non-sensical expressions using the simple lambda calculus. In order to try to work around those problems, and end up with a consistent system, Church introduced the concept of *types*, producing the *simply typed lambda calculus*. Once types hit the scene, things really went wild; the type systems for lambda calculi have never stopped developing: people are still finding new things to do by extending the LC type system today! Most lambda calculus based programming languages are based on the Hindley-Milner lambda calculus, which is a simplification of one of the standard sophisticated typed lambda calculi called *SystemF*. There's even a [Lambda Cube][cube], though it's not related to the Time Cube. Once people really started to understand types, they realized that the *untyped* lambda calculus was really just a pathologically simple instance of the simply typed lambda calculus: a typed LC with only one base type.
The semantics of lambda calculus are easiest to talk about in a typed version. For now, I'll talk about the simplest typed LC, known as the *simply typed lambda calculus*. One of the really amazing things about this, which I'll show, is that a simply typed lambda calculus is completely semantically equivalent to an intuitionistic propositional logic: each type in the program is a proposition in the logic; each β reduction corresponds to an inference step; and each complete function corresponds to a proof! Look below the fold for how.
So in the last few posts, I've been building up the bits and pieces that turn lambda calculus into a useful system. We've got numbers, booleans, and choice operators. The only thing we're lacking is some kind of repetition or iteration.
In lambda calculus, all iteration is done by recursion. In fact, recursion is a pretty natural way of expressing iteration. It takes a bit of getting used to, but anyone who's programmed in a language like Scheme, ML, or Haskell for a while gets very used to idea, and feels frustrated coming back to a language like Java, where you need to write loops.
It can be a bit difficult if you're not used to thinking recursively. I wrote [an explanation of recursion][recursion] which you can go read if you're not used to what recursion is or how it works.
be found here). But since functions in lambda calculus don't have names, that means that we resort to something tricky. It's called the Y combinator, aka the lambda fixed point operator.
Let's start by looking at a simple recursive function outside of the lambda calculus. The factorial function, n!, is the standard example:
factorial(n) = 1 if n = 0,
or factorial(n) = n*factorial(n-1) if n > 0
If we want to start trying to write that in lambda calculus, we'd need a couple of tools... We need a test for equality to zero, and we need a way of multiplying numbers; and we need a way of subtracting one.
For testing equality to zero, we'll use a function named *IsZero*, which takes three parameters: a number, and two values. If the number is 0, it returns the first value; if it's not 0, then it returns the second value.
For multiplication - multiplication is an iterative algorithm, so we can't write multiplication until we work out recursion. But we'll just handwave that for now, and have a function *Mult x y*.
And finally, for subtracting one, we'll use *Pred x* for the predecessor of x - that is, x - 1.
So - a first stab at factorial, written with the recursive call left with a blank in it, would be:
*λ n . IsZero n 1 (Mult n (**something** (Pred n)))*
Now, the question is, what kind of "something" can we plug in to there? What we're really like to do is plug in a copy of the function itself:
*Fact ≡ λ n . IsZero n 1 (Mult n (Fact (Pred n)))*
How can we do that? Well, the usual way of plugging something in to a lambda calculus function is through a parameter:
*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*
Of course, we can't plug in a copy of the function as its own parameter that way: the name *Fact* doesn't exist in the expression in which we're trying to use it. You can't use an undefined name - and in lambda calculus, the *only* way to bind a name is by passing it as a parameter to a λ expression. So what can we do?
The answer is to use something called a *combinator*. A combinator is a special kind of *higher order function* which can be defined without reference to anything but function applications. (A higher order function is a function which takes functions as parameters and returns functions as results). The Y combinator is the special, almost magical function that makes recursion possible. Here's what it looks like:
Y ≡ λ y . (λ x . y (x x)) (λ x . y (x x))
If you look at it, the reason for calling it Y is because it is *shaped* like a Y. To show you that more clearly, sometimes we write lambda calculus using trees. Here's the tree for the Y combinator:
Why is the Y combinator an answer to our problem in defining the factorial function? The Y combinator is something called a *fixed point* combinator. What makes it special is the fact that for any function *f*, *Y f* evaluates to *f Y f*; which evaluates to *f (f Y f)*; which evaluates to *f (f (f Y f))*. See why it's called Y?
Let's try walking through "*Y f*":
1. Expand Y: "*(λ y . (λ x . y (x x)) (λ x . y (x x))) f*"
2. β: "*(λ x . f (x x)) (λ x . f (x x))*
3. β again: "*f (λ x . f (x x)) (λ x . f (x x))*"
4. Since "*Y f = (λ x . f (x x)) (λ x . f (x x))*", what we just got in step three is "f Y f".
See, there's the magic of "Y". No matter what you do, you can't make it consume itself. Evaluating "*Y f*" will produce another copy of *f*, and leave the "Y f" part untouched.
So how do we use this crazy thing?
Remember our last attempt at defining the factorial function? Let's look at it again:
*Fact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n)))) Fact*
Let's rewrite it just a tiny bit to make it easier to talk about:
*Metafact ≡ (λ f n . IsZero n 1 (Mult n (f (Pred n))))*
*Fact ≡ Metafact Fact*
Now - the trick is, "Fact" is not an identifier defined inside of "Fact". How do we let "Fact" reference "Fact"? Well, we did a lambda abstraction to let us pass the "Fact" function as a parameter; so what we needed to do is to find a way to write "Fact" that lets us pass it to itself as a parameter.
What does "Y f" do? It expands into a call to "f" with "Y f" as its first parameter. And "Y f" will expand into "f Y f" - that is, a call to "f" with "Y f" as its parameter. In other words, Y f turns "f" into a recursive function with *itself* as its first parameter.
So the factorial function is:
*Fact ≡ Y Metafact*
*(Y metafact)* is the parameter value of "f" in the metafact lambda; when we do β on the function, if n is zero, then it just returns 1. If it's not zero, then we get the call to *f (Pred n)*. *f* betas to *Y metafact*. Which does that funky magic copying thing, giving us *metafact (Y metafact) (Pred n)*.
Voila, recursion. s
I learned about the Y combinator back in my undergrad days, which would place it around 1989 - and I still find it rather mystifying. I do understand it now, but I can't imagine how on earth anyone ever figured it out!
If you're interested in this, then I highly recommend getting a copy of the book [The Little Schemer][schemer]. It's a wonderful little book - set up like a childrens' book, where the front of each page is a question; and the back of each page is the answer. It's written in a delightfully playful style, it's very fun and engaging, and it will not only teach you to program in Scheme.
As an important side-note there are actually a couple of different versions of the Y combinator. There are different ways of evaluating lambda calculus: given an expression like *(λ x y . x * y) 3 ((λ z. z * z) 4)*"
we can do it in two different orders: we can first do the beta on "*(λ x y . x * y)*",which would give us: "*3 * ((λ z . z * z) 4)*".
Or, we could beta "*((λ z . z * z) 4)*" first: "*(λ x y . x * y) 3 (4 * 4)*". Nn this case, the two orders end up with the same result; but that's not always the case. Sometimes the order of evaluation matters - and the way that the Y combinator works is one of those times. One order will result in expanding the Y infinitely; the other will result in a clean recursive function.
The first order is what we call *lazy evaluation*: don't evaluate the parameters to a function until they're needed. (This is also pretty much the same thing as what we sometime call *by name* parameter passing.) The second is called *eager evaluation* : always evaluate parameters *before* the functions that they're passed to. (In real programming languages, Lisp, Scheme, and ML are lambda-calculus based languages that use eager evaluation; Haskell and Miranda are lambda calculus based languages that use lazy evaluation.) The Y combinator I described above is the Y for *lazy* evaluation. If we used eager evaluation, then Y combinator above wouldn't work - in fact, it would copy Ys forever. There is another version of Y which works for eager evaluation - in fact, it's the one described in "The Little Schemer", and they explain it so much better than I do that I'll just recommend again that you head for whatever bookstore you prefer, and buy yourself a copy.
I'm on vacation this week, so I'm posting reruns of some of the better articles from when Goodmath/Badmath was on Blogger. Todays is a combination of two short posts on numbers and control booleans in λ calculus.
So, now, time to move on to doing interesting stuff with lambda calculus. To
start with, for convenience, I'll introduce a bit of syntactic sugar to let us
name functions. This will make things easier to read as we get to complicated
To introduce a *global* function (that is a function that we'll use throughout our lambda calculus introduction without including its declaration in every expression), we'll use a definition like the following:
*square ≡ λ x . x × x*
This declares a function named "square", whose definition is "*λ x . x×x*". If we had an expression "square 4", the definition above means that it would effectively be treated as if the expression were: "*(λ square . square 4)(λ x . x×x)*".
Numbers in Lambda Calculus
In some of the examples, I used numbers and arithmetic operations. But numbers don't really exist in lambda calculus; all we really have are functions! So we need to invent some way of creating numbers using functions. Fortunately, Alonzo Church, the genius who invented the lambda calculus worked out how to do that. His version of numbers-as-functions are called Church Numerals.
In Church numerals, all numbers are functions with two parameters:
* Zero ≡ *λ s z . z*
* One ≡ *λ s z . s z*
* Two ≡ *λ s z . s (s z)*
* Three ≡ *λ s z . s (s (s z))*
* Any natural number "n", is represented by a Church numeral which is a function which applies its first parameter to its second parameter "n" times.
A good way of understanding this is to think of "z" as being a a name for a zero-value, and "s" as a name for a successor function. So zero is a function which just returns the "0" value; one is a function which applies the successor function once to zero; two is a function which applies successor to the successor of zero, etc. It's just the Peano arithmetic definition of numbers transformed into lambda calculus.
But the really cool thing is what comes next. If we want to do addition, x + y, we need to write a function with four parameters; the two numbers to add; and the "s" and "z" values we want in the resulting number:
add ≡ *λ s z x y . x s (y s z)*
Let's curry that, to separate the two things that are going on. First, it's taking two parameters which are the two values we need to add; second, it needs to normalize things so that the two values being added end up sharing the same binding of the zero and successor values.
add_curry ≡ λ x y. (λ s z . (x s (y s z)))
Look at that for a moment; what that says is, to add x and y: create the church numeral "y" using the parameters "s" and "z". Then **apply x** to that new church numeral y. That is: a number is a function which adds itself to another number.
Let's look a tad closer, and run through the evaluation of 2 + 3:
add_curry (λ s z . s (s z)) (λ s z . s (s (s z)))
To make things easier, let's alpha 2 and 3, so that "2" uses "s2" and "z2", and 3 uses "s3" and "z3";
add_curry (λ s2 z2 . s2 (s2 z2)) (λ s3 z3 . s3 (s3 (s3 z3)))
Now, let's do replace "add_curry" with its definition:
(λ x y .(λ s z. (x s y s z))) (λ s2 z2 . s2 (s2 z2)) (λ s3 z3 . s3 (s3 (s3 z3)))
Now, let's do a beta on add:
λ s z . (λ s2 z2 . s2 (s2 z2)) s (λ s3 z3 . s3 (s3 (s3 z3)) s z)
And now let's beta the church numeral for three. This basically just "normalizes" three: it replaces the successor and zero function in the definition of three with the successor and zero functions from the parameters to add.
λ s z . (λ s2 z2 . s2 (s2 z2)) s (s (s (s z)))
Now.. Here comes the really neat part. Beta again, this time on the lambda for two. Look at what we're going to be doing here: two is a function which takes two parameters: a successor function, and zero function. To add two and three, we're using the successor function from add function; and we're using the result of evaluating three *as the value of the zero!* for two:
λ s z . s (s (s (s (s z))))
And we have our result: the church numeral for five!
Choice in Lambda Calculus
Now that we have numbers in our Lambda calculus, there are only two things missing before we can express arbitrary computations: a way of expressing choice, and a way of expressing repetition. So now I'll talk about booleans and choice; and then next post I'll explain repetition and recursion.
We'd like to be able to write choices as if/then/else expressions, like we have in most programming languages. Following the basic pattern of the church numerals, where a number is expressed as a function that adds itself to another number, we'll express true and false values as functions that perform an if-then-else operation on their parameters. These are sometimes called *Church booleans*. (Of course, also invented by Alonzo Church.)
* TRUE ≡ λ t f . t
* FALSE ≡ λ t f . f
So, now we can write an "if" function, whose first parameter is a condition expression, second parameter is the expression to evaluate if the condition was true, and third parameter is the expression to evaluate if the condition is false.
* IfThenElse ≡ *λ cond t f . cond t f*
For the boolean values to be useful, we also need to be able to do the usual logical operations:
* BoolAnd ≡ *λ x y .x y FALSE*
* BoolOr ≡ *λ x y. x TRUE y*
* BoolNot ≡ *λ x . x FALSE TRUE*
Now, let's just walk through those a bit. Let's first take a look at BoolAnd. We'll try evaluating "*BoolAnd TRUE FALSE*":
* Expand the TRUE and FALSE definitions: "*BoolAnd (λ t f . t) (λ t f . f)*"
* Alpha the true and false: "*BoolAnd (λ tt tf . tt) (λ ft ff . ff)*"
* Now, expand BoolAnd: *(λ t f. t f FALSE) (λ tt tf . tt) (λ ft ff . ff)*
* And beta: *(λ tt tf.tt) (λ ft ff. ff) FALSE*
* Beta again: *(λ xf yf . yf)*
And we have the result: *BoolAnd TRUE FALSE = FALSE*. Now let's look at "*BoolAnd FALSE TRUE*":
* *BoolAnd (λ t f . f) (λ t f .t)*
* Alpha: *BoolAnd (λ ft ff . ff) (λ tt tf . tt)*
* Expand BoolAnd: *(λ x y .x y FALSE) (lambda ft ff . ff) (lambda tt tf . tt)*
* Beta: *(λ ft ff . ff) (lambda tt tf . tt) FALSE
* Beta again: FALSE
So *BoolAnd FALSE TRUE = FALSE*. Finally, *BoolAnd TRUE TRUE*:
* Expand the two trues: *BoolAnd (λ t f . t) (λ t f . t)*
* Alpha: *BoolAnd (λ xt xf . xt) (λ yt yf . yt)*
* Expand BoolAnd: *(λ x y . x y FALSE) (λ xt xf . xt) (λ yt yf . yt)*
* Beta: *(λ xt xf . xt) (λ yt yf . yt) FALSE*
* Beta: (λ yt yf . yt)
So *BoolAnd TRUE TRUE = TRUE*.
The other booleans work in much the same way.
So by the genius of Alonzo church, we have *almost* everything we need in order to write programs in λ calculus.
I thought that for a followup to yesterday's repost of my takedown of Berlinksi, that today I'd show you a digested version of the debate that ensued when Berlinksi showed up to defend himself. You can see the original post and the subsequent discussion here.
It's interesting, because it demonstrates the kinds of debating tactics that people like Berlinski use to avoid actually confronting the genuine issue of their dishonesty. The key thing to me about this is that Berlinski is a reasonably competent mathematician - but watch how he sidesteps to avoid discussing any of the actual mathematical issues.
Berlinksi first emailed his response to me rather than posting it in a comment. So I posted it, along with my response. As I said above, this is a digest of the discussion; if you want to see the original debate in all its gory detail, you can follow the link above.
The way the whole thing started was when Berlinski emailed me after my original post criticizing his sloppy math. He claimed that it was "impossible for him to post comments to my site"; although he never had a problem after I posted this. His email contained several points, written in Berlinski's usual arrogant and incredibly verbose manner:
1. He didn't make up the numbers; he's quoting established literature: "As I have indicated on any number of occasions, the improbabilities that I cited are simply those that are cited in the literature"
2. His probability calculations were fine: "The combinatorial calculations I
made were both elementary and correct."
3. Independence is a valid assumption in his calculations: "Given the chemical
structure of RNA, in which nucleotides are bound to a sugar phosphate backbone
but not to one another, independence with respect to template formation is not
only reasonable as an assumption, but inevitable."
4. He never actually said there could be only one replicator: "There may be many sequences in the pre-biotic environment capable of carrying out various chemical activities."
Of course, he was considerably more verbose than that.
This initial response is somewhat better at addressing arguments than the later ones. What makes that a really sad statement is that even this initial response doesn't *really* address anything:
* He didn't address the specific criticisms of his probability calculations, other than merely asserting that they were correct.
* He doesn't address questions about the required sizes of replicating molecules, other than asserting that his minimum length is correct, and attributing the number to someone else. (While neglecting to mention that there are *numerous* predictions of minimum length, and the one he cited is the longest.)
* He doesn't explain why, even though he doesn't deny that there may have been many potential replicators, his probability calculation is based on the assumption that there is *exactly one*. As I said in my original response to him: In your space of 1060 alleged possibilities, there may be 1 replicator; there may be 1040. By not addressing that, you make your probability calculation utterly worthless.
* He doesn't address his nonsensical requirement for a "template" for a replicator. The idea of the "template" is basically that for a replicator to copy itself, it needs to have a unique molecule called a "template" that it can copy itself onto. It can't replicate onto anything *but* a template, and it can't create the template itself. The "template" is a totally independent chemical, but there is only one possible template that a replicator can copy itself onto. He doesn't address that point *at all* in his response.
* He doesn't address the validity of his assumption that all nucleotide chains of the same length are equally likely.
Berlinksi responded, in absolutely classic style. On of the really noteworthy things about Berlinksi's writing style is its incredible pomposity. This was no disappointment on that count; however, it was quite sad with respect to content. I have to quote the first several lines verbatim, to give you a sense of what I'm talking about when I say he's pompous:
>6 April, 2006
>I have corrected a few trivial spelling errors in your original posting, and I
>have taken the liberty of numbering comments:
>I discuss these points seriatim:
You see, we *really* needed to know that he was in Paris. Crucially important to his arguments to make sure that we realize that! And I'm a bad speller, which is a very important issue in the discussed. And look, he can use latin words for absolutely no good reason!
He spends fifty lines of prose on the issue of whether or no 100 bases is the correct minimum possible length for a replicator. Those 50 lines come down to "I have one source that says that length, so nyah!" No acknowledgment that there are other sources; no reason for *why* this particular source must be correct. Just lots and lots of wordy prose.
He response to the question about the number of replicators by sidestepping. No math, no science; just evading the question:
>2a) On the contrary. Following Arrhenius, I entertain the possibility that
>sequence specificity may not, after all, be a necessary condition for
>demonstrable ligase activity -- or any other biological function, for that
>matter. I observed -- correctly, of course -- that all out evidence is against
>it. All evidence - meaning laboratory evidence; all evidence - meaning our
>common experience with sequence specificity in linguistics or in any other
>field in which an alphabet of words gives rise to a very large sample space in
>which meaningful sequences are strongly non-generic - the space of all
>proteins, for example.
The template stuff? More sidestepping. He rambles a bit, cites several different sources, and asserts that he's correct. The basic idea of his response is: the RNA-world hypothesis assumes Watson-Crick base pairing replication, which needs a template. And the reason that it needs to be Watson-Crick is because anything else is too slow and too error prone. But why does speed matter, if there's no competition? And *of course* the very first replicator would be error prone! Error correction is not something that we would suppose would happen spontaneously and immediately as the first molecule started self-replicating. Error correction is something that would be *selected for* by evolution *after* replication and competition had been established.
Then he sidesteps some more, by playing linguistic games. I referred to the chemicals from which an initial self-replicator developed as "the precursors of a replicator"; he criticizes that phrasing. That's his entire response.
And finally, we get to independence. It's worth quoting him again, to show his tactics:
>There remains the issue of independence. Independence is, of course, the de
>facto hypothesis in probability calculations; and in the case of pre-biotic
>chemistry, strongly supported by the chemical facts. You are not apt to
>dismiss, I suppose, the hypothesis that if two coins are flipped the odds in
>favor if seeing two heads is one in four on the grounds that, who knows?, the
>coins might be biased. Who knows? They might be. But the burden of
>demonstrating this falls on you.
One other quote, to give you more of the flavor of a debate with Berlinski:
>5a) There are two issues here: The first is the provenance of my argument; the
>second, my endorsement of its validity. You have carelessly assumed that
>arguments I drew from the literature were my own invention. This is untrue. I
>expect you to correct this misunderstanding as a matter of scholarly probity.
>As for the second point, it goes without saying that I endorsed the arguments
>that I cited. Why on earth would I have cited them otherwise?
I really love that quote. Such delightful fake indignance; how *dare* I accuse him of fabricating arguments! Even though he *did* fabricate them. The fake anger allows him to avoid actually *discussing* his arguments.
After that, it descends into seeming endless repetition. It's just more of the "nyah nyah I'm right" stuff, without actually addressing the criticism. There's always a way to sidestep the real issue by either using excess wordiness to distract people, or fake indignance that anyone would dare to question anything so obvious!
My response to that is short enough that I'll just quote it, rather than redigesting it:
>As I've said before, I think that there are a few kinds of fundamental errors
>that you make repeatedly; and I don't think your comments really address them
>in a meaningful way. I'm going to keep this as short as I can; I don't like
>wasting time rehashing the same points over and over again.
>With regard to the basic numbers that you use in your probability calculations:
>no probability calculation is any better than the quality of the numbers that
>get put into it. As you admit, no one knows the correct length of a minimum
>replicator. And you admit that no one has any idea how many replicators of
>minimum or close to minimum length there are - you make a non-mathematical
>argument that there can't be many. But there's no particular reason to believe
>that the actual number is anywhere close to one. A small number of the possible
>patterns of minimum length? No problem. *One*? No way, sorry. You need to make
>a better argument to support eliminating 10^60 - 1 values. (Pulling out my old
>favorite, recursive function theory: the set of valid turing machine programs
>is a space very similar to the set of valid RNA sequences; there are numerous
>equally valid and correct universal turing machine programs at or close to the
>minimum length. The majority of randomly generated programs - the *vast*
>majority of randomly generated programs - are invalid. But the number of valid
>ones is still quite large.)
>Your template argument is, to be blunt, silly. No, independence is not the de
>facto hypothesis, at least not in the sense that you're claiming. You do not
>get to go into a probability calculation, say "I don't know the details of how
>this works, and therefore I can assume these events are independent." You need
>to eliminate dependence. In the case of some kind of "pool" of pre-biotic
>polymers and fragments (which is what I meant by precursors), the chemical
>reactions occuring are not occuring in isolation. There are numerous kinds of
>interactions going on in a chemically active environment. You don't get to just
>assume that those chemical interactions have no effect. It's entirely
>reasonable to believe that there is a relationship between the chains that form
>in such an environment; if there's a chance of dependence, you cannot just
>assume independence. But again - you just cook the numbers and use the
>assumptions that suit the argument you want to make.
The rest of the debate was more repetition. Some selected bits:
>No matter how many times I offer a clear and well-supported answers to certain
>criticisms of my essays, those very same criticisms tend to reappear in this
>discussion, strong and vigorous as an octopus.
>1 No one knows the minimum ribozyme length for demonstrable replicator
>activity. The figure of the 100 base pairs required for what Arrhenius calls
>"demonstrable ligase activity," is known. No conceivable purpose is gained from
>blurring this distinction.
>Does it follow, given a sample space containing 1060 polynucleotides of 100
>NT's in length, that the odds in favor of finding any specific polynucleotide
>is one in 1060?
>Of course it does. It follows as simple mathematical fact, just as it follows
>as simple mathematical fact that the odds in favor of pulling any particular
>card from a deck of cards is one in fifty two.
>Is it possible that within a random ensemble of pre-biotic polynucleotides
>there may be more than one replicator?
>Of course it is possible. Whoever suggested the contrary?
This is a great example of Berlinksi's argument style. Very arrogant argument by assertion, trying to throw as much text as possible at things in order to confuse them.
The issues we were allegedly "discussing" here was whether or not the space of nucleotide chains of a given length could be assumed to be perfectly uniform; and whether or not it made sense to assert that there was only *one* replicator in that space of 1060 possible chains.
As you can see, his response to the issue of distribution is basically shouting: "**Of course** it's uniform, any moron can see that!".
Except that it *isn't* uniform. In fact, quite a number of chains of length 100 *are impossible*. It's a matter of geometry: the different chains take different shapes depending on their constituents. Many of the possible chains are geometrically impossible in three dimensions. How many? No one is sure: protein folding is still a big problem: given our current level of knowledge, figuring out the shape of a protein that we *know* exists is still very difficult for us.
And his response to his claim that there is exactly one replicator in that space? To sidestep it by claiming that he never said that. Of course, he calculated his probability using that as an assumption, but he never explicitly *said* it.
His next response opened with a really wonderful example of his style: pure pedantry that avoids actually *discussing* the criticisms of his points.
>>The point is that we're talking about some kind of pool of active chemicals
>>reacting with one another and forming chains ....
>What you are talking about is difficult to say. What molecular biologists are
>talking about is a) a random pool of beta D-nucleotides; and b) a random
>ensemble of polynucleotides. The polynucleotides form a random ensemble because
>chain polymerization is not sequence-specific.
>>The set of very long chains that form is probably not uniform ....
>Sets are neither uniform nor non-uniform. It is probability distributions that
>are uniform. Given a) and b) above, one has a classical sampling with
>replacement model in the theory of probability, and thus a uniform and discrete
Ok. Anyone out there who could read this argument, and *not* know what I was talking about when I said "some kind of pool of active chemicals reacting with one another and forming chains"?
How about anyone who thinks that my use of the word "set" in the quote above is the least bit unclear? Anyone who thinks that "the set of very long chains that form is probably not uniform" is the *least* bit ambiguous?
No, I thought not. The point, as usual, is to avoid actually *addressing* difficult arguments. So when confronted with something hard to answer, you look for a good distraction, like picking on grammar or word choice, so that you can pretend that the reason you can't answer an argument is because the argument didn't make sense. So he focuses on the fact that I didn't use the word "set" in its strict mathematical sense, and then re-asserts his argument.
Another example. At one point in the argument, disputing his assertion of independent probabilities for his "template" and his "replicator", I said the following:
>I'm *not* saying that in general, you can't make assumptions of independence.
>What I'm saying is what *any* decent mathematician would say: to paraphrase my
>first semester probability book: "independence between two events is a valid
>assumption *if and only if* there is no known interaction between the events."
>That is the *definition* of independence...
Berlinksi's response? Again, pure distractive pedantry:
>If you are disposed to offer advice about mathematics, use the language, and
>employ the discipline, common to mathematics itself. What you have offered is
>an informal remark, and not a definition. The correct definition is as follows:
>Two events A and B are independent if P(AB) = P(A)P(B). As a methodological
>stricture, the remark you have offered is, moreover, absurd inasmuch as some
>interaction between events can never be ruled out a priori, at least in the
Does this address my criticism? No.
The Bayesian rules for combining probabilities say "If A and B are independent, then the probability of AB is the probability of A times the probability of B". You *can* invert that definition, and use it to show that two events are independent, by showing that the probability of their occurring together is
the product of their individual probabilities. What he's doing up there is a pedantic repetition of a textbook definition in the midst of some arrogant posturing. But since he's claiming to be talking mathematically, let's look at what he says *mathematically*. I'm asserting that you need to show that events are independent if you want to treat them as independent in a probability calculation. He responds by saying I'm not being mathematical; and spits out the textbook definition. So let's put the two together, to see what Berlinksi is arguing mathematically:
>We can assume that the probability of two events, A and B are independent and
>can be computed using P(AB) = P(A)×P(B) if and only if the probability
>P(AB) = P(A)×P(B).
Not a very useful definition, eh?
And his response to my criticism of that?
>I am quite sure that I have outstayed my welcome. I'm more than happy to let
>you have the last words. Thank you for allowing me to post my own comments.
And that was the end of the debate.
Sad, isn't it?
I'm on vacation this week, so I'm recycling some posts that I thought were particularly interesting to give you something to read.
In computer science, especially in the field of programming languages, we tend to use one particular calculus a lot: the Lambda calculus. Lambda calculus is also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. Lambda calculus is great for a lot of reasons, among them:
1. It's very simple.
2. It's Turing complete.
3. It's easy to read and write.
4. It's semantics are strong enough that we can do reasoning from it.
5. It's got a good solid model.
6. It's easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.
The ease of reading and writing lambda calculus is a big deal. It's led to the development of a lot of extremely good programming languages based, to one degree or another, on the lambda calculus: Lisp, ML, and Haskell are very strongly lambda calculus based.
The lambda calculus is based on the concept of *functions*>. In the pure lambda calculus, *everything* is a function; there are no values *at all* except for functions. But we can build up anything we need using functions. Remember back in the early days of this blog, I talked a bit about how to build mathematics? We can build the entire structure of mathematics from nothing but lambda calculus.
So, enough lead-in. Let's dive in a look at LC. Remember that for a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.
Lambda Calculus Syntax
The lambda calculus has exactly three kinds of expressions:
1. Function definition: a function in lambda calculus is an expression, written: "*λ param . body*" which defines a function with one parameter.
2. Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
3. Function application: applying a function is written by putting the function value in front of its parameter, as in "*x y*" to apply the function "*x*" to the value "*y*".
There's a trick that we play in lambda calculus: if you look at the definition above, you'll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint - how can you even implement addition with only one parameter?
It turns out to be no problem, because of the fact that *functions are values*. So instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function - in the end, it's effectively the same thing. It's called currying, after the great logician Haskell Curry.
For example, suppose we wanted to write a function to add x and y. We'd like to write something like: "*λ x y . x + y*". The way we do that with one-parameter functions is: we first write a function with one parameter, which returns *another* function with one parameter.
Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter:
*λ x . (λ y . x + y)*
Now that we know that adding multiple parameter functions doesn't *really* add anything but a bit of simplified syntax, we'll go ahead and use them when it's convenient.
### Free vs Bound Identifiers
One important syntactic issue that I haven't mentioned yet is *closure* or *complete binding*. For a lambda calculus expression to be evaluated, it cannot reference any identifiers that are not *bound*. An identifier is bound if it a parameter in an enclosing lambda expression; if an identifier is *not* bound in any enclosing context, then it is called a *free* variable.
1. *λ x . plus x y*: in this expression, "y" and "plus" are free, because they're not the parameter of any enclosing lambda expression; x is bound because it's a parameter of the function definition enclosing the expression "plus x y" where it's referenced.
2. *λ x y.y x*: in this expression both x and y are bound, because they are parameters of the function definition.
3. *λ y . (λ x . plus x y*: In the inner lambda, "*λ x . plus x y*", y and plus are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner lambda, and y is bound by the other lambda. "plus" is still free.
We'll often use "free(x)" to mean the set of identifiers that are free in the expression "x".
A lambda calculus expression is completely valid only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables - and making sure that the variables that are free in subexpressions are treated right is very important.
## Lambda Calculus Evaluation Rules
There are only two real rules for evaluating expressions in lambda calculus; they're called α and beta. α is also called "conversion", and beta is also called "reduction".
### α Conversion
α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in lambda calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.
So - for instance, if we had an expression like:
*λ x . if (= x 0) then 1 else x^2*
We can do α to replace X with Y (written "α[x/y]" and get):
*λ y . if (= y 0) then 1 else y^2*
Doing α does *not* change the meaning of the expression in any way. But as we'll see later, it's important because it gives us a way of doing things like recursion.
### β Reduction
β reduction is where things get interesting: this single rule is all that's needed to make the lambda calculus capable of performing *any* computation that can be done by a machine.
Beta basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it's actually pretty easy when you see it in action.
Suppose we have the application expression: "*(λ x . x + 1) 3*". What beta says is that we can replace the application by taking the body of the function (which is "*x + 1*"); and replacing references to the parameter "*x*" by the value "*3*"; so the result of the beta reduction is "*3 + 1*".
A slightly more complicated example is the expression:
*λ y . (λ x . x + y)) q*
It's an interesting expression, because it's a lambda expression that when applied, results in another lambda expression: that is, it's a function that creates functions. When we do beta reduction in this, we're replacing all references to the parameter "y" with the identifier "q"; so, the result is "*λ x. x+q*".
One more example, just for the sake of being annoying:
"*(λ x y. x y) (λ z . z × z) 3*". That's a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter "*x*" in the body of the first function with "*λ z . z × z*"; and we replace the parameter "*y*" with "3", getting: "*(λ z . z × z) 3*". And we can perform beta on that, getting "*3 × 3*".
Written formally, beta says:
λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e]
That condition on the end, "if free(e) subset free(B[x := e]" is why we need α: we can only do beta reduction *if* doing it doesn't create any collisions between bound identifiers and free identifiers: if the identifier "z" is free in "e", then we need to be sure that the beta-reduction doesn't make "z" become bound. If there is a name collision between a variable that is bound in "B" and a variable that is free in "e", then we need to use α to change the identifier names so that they're different.
As usual, an example will make that clearer: Suppose we have a expression defining a function, "*λ z . (λ x . x+z)*". Now, suppose we want to apply it:
*(λ z . (λ x . x + z)) (x + 2)*
In the parameter "*(x + 2)*", x is free. Now, suppose we break the rule and go ahead and do beta. We'd get "*λ x . x + x + 2*".
The variable that was *free* in "x + 2" is now bound. Now suppose we apply that function: "*(λ x . x + x + 2) 3*". By beta, we'd get "3 + 3 + 2", or 8.
What if we did α the way we were supposed to?
By α[x/y], we would get "*λ z . (λ y . y + z) (x+2)*".
lambda z . (lambda y . y+z)) (x + 2). Then by β, we'd get "*λ y . y + x + 2*". If we apply this function the way we did above, then by β, we'd get "*3+x+2*".
"*3+x+2*" and "*3+3+2*" are very different results!
And that's pretty much it. There's another *optional* rule you can add called η-conversion. η is a rule that adds *extensionality*, which provides a way of expressing equality between functions.
η says that in any lambda expression, I can replace the value "f" with the value "g" if/f for all possible parameter values x, *f x = g x*.
What I've described here is Turing complete - a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I'll talk about those in my next post.
We also haven't defined a model for lambda-calculus yet. (I discussed models here and here.) That's actually quite an important thing! LC was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although LC looked correct, the early attempts to define a model for it were failures. After all, remember that if there isn't a valid model, that means that the results of the system are meaningless!