Update your links! The new RSS is here.

All of the existing posts and comments have been moved over. I'm shutting down comments here, but you're welcome to come on over to the new site and continue any open conversations!

]]>.global _start _start: MOV R7, #4 MOV R0, #1 MOV R2, #12 LDR R1, =string SWI 0 MOV R7, #1 SWI 0 .data string: .ascii "Hello Worldn"

It's definitely a bit more cryptic than most languages, but it doesn't look all that bad, now does it? Before I can explain how that works, we'll need to talk a bit about what we're programming, and how we can program it. We're going to go through a bunch of introductory material here; everything that we touch on, I'll come back to in more detail in a later post.

In the diagram to the right, you can see my attempt to draw a diagram illustrating the important parts of an ARM CPU from our initial perspective. As we learn more about it, we'll gradually refine this picture, adding more details - but for now, this is what things look like.

For now, we'll say that the CPU has 5 main parts:

- A collection of 16
*registers*. A register is a memory cell that's built in to the CPU. On an ARM processor, any time that you want to do any kind of operation - arithmetic, logic, comparison, you name it! - you'll need to have the values in a register. The first thirteen registers are available to you, to use for whatever you want. The last three are special;`R13`

is called the stack pointer (`SP`

),`R14`

is called the*link register*(`LR`

), and`R15`

is called the*program counter*(`PC`

). We'll talk about what those three mean as we learn to program. - An arithmetic/logic unit (ALU). This is where the CPU does integer arithmetic and logic. Most of our programs will work exclusively with the ALU. (Floating point is important, but it's possible to do an awful lot of programming without it.)
- A floating point unit (FPU). This is where the CPU does floating point arithmetic.
- A
*status register*. This is, like the other registers, a chunk of internal storage. But you can't manipulate it or access it directly. It's automatically updated by the ALU/FPU. Individual bits of the status register get updated to reflect various conditions about the current status of the CPU, and the results of the previous instruction. For example, the way that you can compare two values in the ARM is to subtract one from the other. If the two values were equal, then the ZERO flag in the status register will be set to 1; otherwise it will be set to 0. There's a branch instruction that only actually branches if the ZERO flag is set. - A data channel, called the
*bus*. The bus connects the CPU to the rest of the computer. Memory, other storage devices, and input and output devices are all connected to the CPU via the bus. Doing anything that involves communicating through the bus is*slow*compared to doing anything that doesn't. For now, we'll say that memory is the only thing on the bus.

Now that we have a bit of a clue about the basic pieces of this thing we're going to program, we can start looking at our hello world program. We still need to talk about one other bit of background before we can get started.

For a computer, on the lowest level, a "program" is just a chunk of numbers. It's not even a chunk of instructions - it's *just* numbers. The numbers can be instructions, data, or both at the same time! That last bit might sound strange, but you'll see instructions like `MOV R0, #4`

. That's saying load the literal value 4 into register R0. The 4 is a value encoded as a part of an instruction. So that 4 is both literal data sitting in the middle of a collection of instructions, and it's also a part of an instruction. The actual instruction doesn't really say "load the value 4"; it says "load the data value that's at this position in the instruction sequence".

We're not going to program the ARM using the numeric instructions directly. We're going to program the ARM using *assembly language*. Assembly language is a way of writing that chunk of numbers that is your program, but doing it with a syntax that easy for a human being to read. Then a program called an *assembler* will translate from that readable format into the raw numeric format used by the computer. Conceptually, the assembler sounds a lot like the compiler that you'd use with a higher level language. But it's quite different: compilers take your code, and *change it*. Frequently, if you look at code that your compiler generates, you'd have a hard time recognizing code that was generated for a program that you wrote! But an assembel doesn't change anything. There's no restructuring, no optimization, no changes at all. In an assembly language program, you're describing how to lay out a bunch of instructions and data in memory, and the assembler does nothing but generate that exact memory layout.

Ok. That said, finally, we can get to the program!

Programming in assembly is quite different from programming in any reasonable programming language. There are no abstractions to make your life easier. You need to be painfully explicit about everything. It really brings home just how many abstractions you generally use in your code.

For example, in assembly language, you don't really have variables. You can store values anywhere you want in the computer's memory, but you have to decide where to put them, and how to lay them out, by yourself. But as I said before - all of the arithmetic and logic that makes up a program has to be done on values *in registers*. So a value in memory is only good if you can move it from memory into a register. It's almost like programming in a language with a total of 16 variables - only you're only really allowed to use 13 of them!

Not only do you not have variables, but you don't really have parameters. In a high level programming language, you can just pass things to subroutines. You don't need to worry about *how*. Maybe they're going onto a stack; maybe there' doing some kind of fancy lambda calculus renaming thing; maybe there's some magic variables. You don't need to know or care. But in assembly, there is no built-in notion of parameter-passing. You need to use the computer's register and memory to build a parameter passing system. In the simplest form of that, which is what we're using here, you designate certain registers as carrying certain parameters. There's nothing in assembly to enforce that: if your program puts something into register `R3`

, and a function was expecting it to be in `R4`

, you won't get any kind of error.

In our "Hello world" program above, the first three instructions are loading specific values into registers expected by the operating system "print" function. For example, `MOV R0, #4`

means move the specific number 4 into register R0.

Loading literal values into registers are done using the `MOV`

instruction. It's got two operands, the register to move the data into, and the source of the data. The source of the data can be either a literal value, or another register. If you want to load data from memory, you need to use a different instruction - `LDR`

.

With the `LDR`

instruction, we can see one of the conveniences of using assembly language. We want to print the string "Hello world". So we need to have that string in memory somewhere. The assembler lets us do that using a `.ascii`

directive. The directive isn't an ARM instruction; it's an instruction to the assembler telling it "I want to put *this string data* into a block in memory". The `.ascii`

directive is prefaced with a *label*, which allows us to refer to the beginning of the memory block populated by the directive. Now we can use "string" to refer to the memory block. So the instruction `LDR R1, =string`

is exactly the same as saying `LDR R1, `

, where *address**address* is the memory location where the first byte of the string is stored.

These four instructions have been preparation for calling a function provided by the operating system. `R0`

and `R7`

are used by the operating system to figure out what function we want to call. `R1`

and `R2`

are being used to pass parameters to the function. The print function expects `R1`

to contain the memory location of the first byte in the string we want to print, and `R2`

to contain the number of characters in the string.

We call the function using `SWI 0`

. `SWI`

is the *software interrupt* function. We can't call the operating system directly! One of the purposes of the operating system is to provide a safe environment, where different programs can't accidentally interfere with one another. If you could just branch into an OS function directly, any program would be able to do anything it wanted! But we don't allow that, so the program can't directly call anything in the OS. Instead, what it does is send a special kind of signal called an *interrupt*. Before it runs our program, the operating system has already told the CPU that any time it gets an interrupt, control should be handed to the OS. So the operating system gets called by the interrupt. It sees the values in `R0`

and `R7`

, and recognizes that the interrupt is a request to run the "print" function, so it does that. Then it returns from the interrupt - and execution continues at the first instruction after the `SWI`

call.

Now it's returned from the print, and we don't want to do anything else. If we didn't put something here to tell the operating system that we're done, the CPU would just proceed to the next memory address after our `SWI`

, and interpret that as an instruction! We need to specifically say "We're done", so that the operating system takes control away from our program. The way we do that is with another SWI call. This SWI is the operating system "exit" call. To exit a program and kill the process, you call SWI with `R0=1`

and `R7=1`

.

And that's it. That's hello-world in assembly.

]]>See, these guys built a new programming language which solves all the problems! I mean, just look how daft all of us programming language implementors are!

Today’s languages were each designed with different goals in mind. Matlab was built for matrix calculations, and it’s great at linear algebra. The R language is meant for statistics. Ruby and Python are good general purpose languages, beloved by web developers because they make coding faster and easier. But they don’t run as quickly as languages like C and Java. What we need, Karpinski realized after struggling to build his network simulation tool, is a single language that does everything well.

See, we've been wasting our time, working on languages that are only good for one thing, when if only we'd had a clue, we would have just been smart, and built one perfect language which was good for everything!

How did they accomplish this miraculous task?

Together they fashioned a general purpose programming language that was also suited to advanced mathematics and statistics and could run at speeds rivaling C, the granddaddy of the programming world.

Programmers often use tools that translate slower languages like Ruby and Python into faster languages like Java or C. But that faster code must also be translated — or compiled, in programmer lingo — into code that the machine can understand. That adds more complexity and room for error.

Julia is different in that it doesn’t need an intermediary step. Using LLVM, a compiler developed by University of Illinois at Urbana-Champaign and enhanced by the likes of Apple and Google, Karpinski and company built the language so that it compiles straight to machine code on the fly, as it runs.

Ye bloody gods, but it's hard to know just where to start ripping that apart.

Let's start with that last paragraph. Apparently, the guys who designed Julia are geniuses, because they used the LLVM backend for their compiler, eliminating the need for an intermediate language.

That's clearly a revolutionary idea. I mean, no one has *ever* tried to do that before - no programming languages except C and C++ (the original targets of LLVM). Except for Ada. And D. And fortran. And Pure. And Objective-C. And Haskell. And Java. And plenty of others.

And those are just the languages that specifically use the LLVM backend. There are others that use different code generators to generate true binary code.

But hey, let's ignore that bit, and step back.

Let's look at what they say about how other people implement programming languages, shall we? The problem with other languages, they allege, is that their implementations don't actually generate machine code. They translate from a slower language into a faster language. Let's leave aside the fact that speed is an attribute of an implementation, not a language. (I can show you a CommonLisp interpreter that's slow as a dog, and I can show you a CommonLisp interpreter that'll knock your socks off.)

What do the Julia guys actually do? They write a front-end that generates LLVM intermediate code. That is, they *don't* generate machine code directly. They translate code written in their programming languages into code written in an abstract virtual machine code. And then they take the virtual machine code, and pass it to the LLVM backend, which translates from virtual code to actual true machine code.

In other words, they're not doing anything different from pretty much any other compiled language. It's incredibly rare to see a compiler that actually doesn't do the intermediate code generation. The only example I can think of at the moment is one of the compilers for Go - and even it uses some intermediates internally.

Even if Julia never displaces the more popular languages — or if something better comes along — the team believes it’s changing the way people think about language design. It’s showing the world that one language can give you everything.

That said, it isn’t for everyone. Bezanson says it’s not exactly ideal for building desktop applications or operating systems, and though you can use it for web programming, it’s better suited to technical computing. But it’s still evolving, and according to Jonah Bloch-Johnson, a climate scientist at the University of Chicago who has been experimenting with Julia, it’s more robust than he expected. He says most of what he needs is already available in the language, and some of the code libraries, he adds, are better than what he can get from a seasoned language like Python.

So, our intrepid reporter tells us, the glorious thing about Julia is that it's one language that can give you everything! This should completely change the whole world of programming language design - because us idiots who've worked on languages weren't smart enough to realize that there should be one language that does everything!

And then, in the very next paragraph, he points out that Julia, the great glorious language that's going to change the world of programming language design by being good at everything, isn't good at everything!

Jeebus. Just shoot me now.

I'll finish with a quote that pretty much sums up the idiocy of these guys.

“People have assumed that we need both fast and slow languages,” Bezanson says. “I happen to believe that we don’t need slow languages.”

This sums up just about everything that I hate about what happens when idiots who don't understand programming languages pontificate about how languages should be designed/implemented.

At the moment, in my day job, I'm doing almost all of my programming in Python. Now, I'm not exactly a huge fan of Python. There's an awful lot of slapdash and magic about it that drive me crazy. But I can't really dispute the decision to use it for my project, because it's a very good choice.

What makes it a good choice? A certain kind of flexibility and dynamicism. It's a great language for splicing together different pieces that come from different places. It's not the fastest language in the world. But for my purposess, that's completely irrelevant. If you took a super-duper brilliant, uber-fast language with a compiler that could generate perfectly optimal code every time, it wouldn't be any faster than my Python program. How can that be?

Because my Python program spends most of its time idle, waiting for something to happen. It's talking to a server out on a datacenter cluster, sending it requests, and then waiting for them to complete. When they're done, it looks at the results, and then generates output on a local console. If I had a fast compiler, the only effect it would have is that my program would spend more time idle. If I were pushing my CPU anywhere close to its limits, using less CPU before going idle might be helpful. But it's not.

The speed of the language doesn't matter. But by making my job easier - making it easier to write the code - it saves something much more valuable than CPU time. It saves *human* time. And a human programmer is vastly more expensive than another 100 CPUs.

We don't specifically need *slow* languages. But no one sets out to implement a slow language. People implement *useful* languages. And they make intelligent decisions about where to spend their time. You could implement a machine code generator for Python. It would be an extremely complicated thing to do - but you could do it. (In fact, someone is working on an LLVM front-end for Python! It's not for Python code like my system, but there's a whole community of people who use Python for implementing numeric processing code with NumPy.) But what's the benefit? For most applications, absolutely nothing.

According the the Julia guys, the perfectly rational decision to *not* dedicate effort to optimization when optimization won't actually pay off is a bad, stupid idea. And that should tell you all that you need to know about their opinions.

For those who don't know, there's a complete horses ass named Henry Gee. Henry is an editor at the science journal Nature. Poor Henry got into some fights with DrIsis (a prominent science blogger), and DrIsis was mean to him. The poor little guy was so hurt that he decided that he needed to get back at her - and so, Henry went ahead and he outed her, announcing her real name to the world.

This was a thoroughly shitty thing to do.

It's not that I think Isis didn't do anything wrong. We've got history, she and I. My experience with her led me to conclude that she's a petty, vicious bully that takes great pleasure in inflicting pain and anguish on other people. She's also someone who's done a lot of good things for her friends, and if you want to find out about any of it, go read another blog - plenty of people have written about her in the last couple of days.

If she's so awful, why do I care that someone outed her?

Because it's not just about her.

The community that we're a part of isn't something which has been around for all that long. There's still a lot of fudging around, figuring out the boundaries of our online interactions. When people play games like outing someone who's using a pseudonym, they're setting a precedent: they're declaring to the community that "I know Xs real name, and here it is". But beyond that, they're *also* declaring to the community that "I believe that our community standards should say that this is an appropriate way to deal with conflict".

I don't want that to be something that people in my community do.

People use pseudonyms for a lot of different reasons. Some people do for bad reasons, like separating unethical online behavior from their professional identity. But some people do it to avoid professional retaliation for perfectly reasonable behaviors - there are tenure committees at many universities that would hold blogging against a junior faculty; there are companies that don't won't allow employees to blog under their real names; there are people who blog under a pseudonym in order to protect themselves from physical danger and violence!

Once you say "If someone makes me angry enough, it's all right for me to reveal their real identity", what you're saying is that none of those reasons matter. Your hurt feelings take precedence. You've got the right to decide whether their reasons for using a pseudonym areimportant enough to protect or not.

Sorry, but no. People's identities belong to them. I don't care how mean someone is to you online: you don't have the right to reveal their identity. Unless someone is doing something *criminal*, their identity isn't yours to reveal. (And if they *are* doing something criminal, you should seriously consider reporting them to the appropriate legal authorities, rather than screwing around online!)

But to be like Mr. Gee, and just say "Oh, she hurt my feelings! I'm going to try to hurt her back"! That's bullshit. That's childish, kindergarten level bullshit. And frankly, for someone who's an editor at a major scientific journal, who has access to all sorts of information about anonymous referees and authors? It's seriously something that crosses the line of professional ethics to the point where if *I* were in the management at Nature, I'd probably fire him for it.

But Henry didn't stop there: no! He also went ahead and - as an editor of Nature! - told people who criticized him for doing this that he want "adding them to the list".

What kind of list do you think Henry is adding them to? This guy who's showed how little he cares about ethics - what do you think he's going to do to the people who he's adding to his list?

I think that if Nature doesn't fire this schmuck, there's something even more seriously wrong over there than any of us expected.

]]>A couple of caveats before I start:

- this is the area of math where I'm at my worst. I am
*not*good at analysis. I'm struggling to understand this stuff well enough to explain it. If I screw up, please let me know in the comments, and I'll do my best to update the main post promptly. - This is way more complicated than most of the stuff I write on this blog. Please be patient, and try not to get bogged down. I'm doing my best to take something that requires a whole lot of specialized knowledge, and explain it as simply as I can.

What I'm trying to do here is to get rid of some of the mystery surrounding this kind of thing. When people think about math, they frequently get scared. They say things like "Math is *hard*, I can't hope to understand it.", or "Math produces weird results that make no sense, and there's no point in my trying to figure out what it means, because if I do, my brain will explode. Only a super-genius geek can hope to understand it!"

That's all rubbish.

Math *is* complicated, because it covers a whole lot of subjects. To understand the details of a particular branch of math takes a lot of work, because it takes a lot of special domain knowledge. But it's not fundamentally different from many other things.

I'm a professional software engineer. I did my PhD in computer science, specializing in programming languages and compiler design. Designing and building a compiler is *hard*. To be able to do it well and understand everything that it does takes years of study and work. But anyone should be able to understand the basic concepts of what it does, and what the problems are.

I've got friends who are obsessed with baseball. They talk about ERAs, DIERAs, DRSs, EQAs, PECOTAs, Pythagorean expectations, secondary averages, UZRs... To me, it's a huge pile of gobbledygook. It's complicated, and to understand what any of it means takes some kind of specialized knowledge. For example, I looked up one of the terms I saw in an article by a baseball fan: "Peripheral ERA is the expected earned run average taking into account park-adjusted hits, walks, strikeouts, and home runs allowed. Unlike Voros McCracken's DIPS, hits allowed are included." I have no idea what that means. But it seems like everyone who loves baseball - including people who think that they can't do their own income tax return because they don't understand how to compute percentages - understand that stuff. They care about it, and since it means something in a field that they care about, they learn it. It's not beyond their ability to understand - it just takes some background to be able to make sense of it. Without that background, someone like me feels lost and clueless.

That's the way that math is. When you go to look at a result from complex analysis without knowing what complex analysis *is*, it looks like terrifyingly complicated nonsensical garbage, like "A meromorphic function is a function on an open subset of the complex number plain which is holomorphic on its domain except at a set of isolated points where it must have a Laurent series".

And it's definitely *not* easy. But understanding, in a very rough sense, what's going on and what it means is *not* impossible, even if you're not a mathematician.

Anyway, what the heck is the Riemann zeta function?

It's not easy to give even the simplest answer of that in a meaningful way.

Basically, Riemann Zeta is a function which describes fundamental properties of the prime numbers, and therefore of our entire number system. You can use the Riemann Zeta to prove that there's no largest prime number; you can use it to talk about the expected frequency of prime numbers. It occurs in various forms all over the place, because it's fundamentally tied to the structure of the realm of numbers.

The starting point for defining it is a *power series* defined over the complex numbers (note that the parameter we use is \(s\) instead of a more conventional \(x\): this is a way of highlighting the fact that this is a function over the complex numbers, not over the reals).

\[zeta(s) = sum_{n=1}^{infty} n^{-s}\]

This function \(zeta\) is *not* the Riemann function!

The Riemann function is something called the *analytic continuation* of \(zeta\). We'll get to that in a moment. Before doing that; why the heck should we care? I said it talks about the structure of numbers and primes, but how?

The zeta function actually has a *lot* of meaning. It tells us something fundamental about properties of the system of real numbers - in particular, about the properties of prime numbers. Euler proved that Zeta is deeply connected to the prime numbers, using something called Euler's identity. Euler's identity says that for all integer values:

\[sum_{n=1}^{infty} n^{-s} = prod_{p in textbf{Primes}} frac{1}{1-p^{-s}}\]

Which is a way of saying that the Riemann function can describe the probability distribution of the prime numbers.

To really understand the Riemann Zeta, you need to know how to do analytic continuation. And to understand that, you need to learn a lot of number theory and a lot of math from the specialized field called complex analysis. But we can describe the basic concept without getting *that* far into the specialized stuff.

What is an analytical continuation? This is where things get really sticky. Basically, there are places where there's one way of solving a problem which produces a diverging infinite series. When that happens you say there's no solution, that thepoint where you're trying to solve it isn't in the domain of the problem. But if you solve it in a different way, you can find a way of getting a solution that works. You're using an analytic process to extend the domain of the problem, and get a solution at a point where the traditional way of solving it wouldn't work.

A nice way to explain what I mean by that requires taking a

diversion, and looking at a metaphor. What we're talking about here isn't analytical continuation; it's a *different* way of extending the domain of a function, this time in the realm of the real numbers. But as an example, it illustrates the concept of finding a way to get the value of a function in a place where it doesn't seem to be defined.

In math, we like to play with limits. One example of that is in differential calculus. What we do in differential

calculus is look at continuous curves, and ask: at one specific location on the curve, what's the slope?

If you've got a line, the slope is easy to determine. Take any two points on the line: \((x_1, y+1), (x_2, y_2)\), where \(x_1 < x_2\). Then the slope is \(frac{y_2 - y_1}{x_2 - x_1}\). It's easy, because for a line, the slope never changes.

If you're looking at a curve more complex than line, then slopes get harder, because they're constantly changing. If you're looking at \(y=x^2\), and you zoom in and look at it very close to \(x=0\), it looks like the slope is very close to 0. If you look at it close to 1, it looks like it's around 2. If you look at it at x=10, it looks a bit more than 20. But there are no two points where it's exactly the same!

So how can you talk about the slope at a particular point \(x=k\)? By using a limit. You pick a point really close to \(x=k\), and call it \(x=k+epsilon\). Then an approximate value of the slope at \(k\) is:

\[frac{(x+epsilon)^2 - x^2}{x+epsilon - x}\]

The smaller epsilon gets, the closer your approximation gets. But you *can't* actually get to \(epsilon=0\), because if you did, that slope equation would have 0 in the denominator, and it wouldn't be defined! But it *is* defined for *all* non-zero values of \(epsilon\). No matter how small, no matter how close to zero, the slope is defined. But *at* zero, it's no good: it's undefined.

So we take a limit. As \(epsilon\) gets smaller and smaller, the slope gets closer and closer to some value. So we say that the slope at the point - at the exact place where the denominator of that fraction becomes zero - is defined as:

\[ lim_{epsilon rightarrow 0} frac{(k+epsilon)^2 - k^2}{k+epsilon - k} =\]

\[ lim_{epsilon rightarrow 0} frac{ k^2 + 2kepsilon + epsilon^2 - k^2}{epsilon} = \]

*(Note: the original version of the previous line had a missing "-". Thanks to commenter Thinkeye for catching it.)*

\[ lim_{epsilon rightarrow 0} frac{ 2kepsilon + epsilon^2}{epsilon} = \]

Since \(epsilon\) is getting closer and closer to zero, \(epsilon^2\) is getting smaller much faster; so we can treat it as zero:

\[ lim_{epsilon rightarrow 0} frac{ 2kepsilon}{epsilon} = 2k\]

So at any point \(x=k\), the slope of \(y=x^2\) is \(2k\). Even though computing that involves dividing by zero, we've used an analytical method to come up with a meaningful and useful value at \(epsilon=0\). This doesn't mean that you can divide by zero. You cannot conclude that \(frac{2*0}{0} = 2\). But for this particular analytical setting, you can come up with a meaningful solution to a problem that involves, in some sense, dividing by zero.

The limit trick in differential calculus is *not* analytic continuation. But it's got a tiny bit of the flavor.

Moving on: the idea of analytic continuation comes from the field of complex analysis. Complex analysis studies a particular class of functions in the complex number plane. It's not one of the easier branches of mathematics, but it's *extremely* useful. Complex analytic functions show up all over the place in physics and engineering.

In complex analysis, people focus on a particular group of functions that are called analytic, holomorphic, and meromorphic. (Those three are closely related, but *not* synonymous.).

A holomorphic function is a function over complex variables, which has one

important property. The property is almost like a kind of abstract smoothness. In the simplest case, suppose that we have a complex equation in a single variable, and the domain of this function is \(D\). Then it's holomorphic if, and only if, for every point \(d in D\), the function is complex differentiable in some neighborhood of points around \(d\).

(Differentiable means, roughly, that using a trick like the one we did above, we can take the slope (the *derivative*) around \(d\). In the complex number system, "differentiable" is a much stronger condition than it would be in the reals. In the complex realm, if something is differentiable, then it is *infinitely* differentiable. In other words, given a complex equation, if it's differentiable, that means that I can create a curve describing its slope. That curve, in turn, will *also* be differentiable, meaning that you can derive an equation for its slope. And *that* curve will be differentiable. Over and over, forever: the derivative of a differentiable curve in the complex number plane will always be differentiable.)

If you have a differentiable curve in the complex number plane, it's got one really interesting property: it's representable as a power series. (This property is what it means for a function to be called *analytic*; all holomorphic functions are analytic.) That is, a function \(f\) is holomorphic for a set \(S\) if, for all points \(s in S\), you can represent the value of the function as a power series for a disk of values around \(s\):

\[ f(z) = sum_{n=0}^{infty} a_n(z-c)^n\]

In the simplest case, the constant \(c\) is 0, and it's just:

\[ f(z) = sum_{n=0}^{infty} a_nz^n\]

*(Note: In the original version of this post, I miswrote the basic pattern of a power series, and put both \(z\) and \(s\) in the base. Thanks to John Armstrong for catching it.)*

The function that we wrote, above, for the base of the zeta function is exactly this kind of power series. Zeta is an analytic function for a particular set of values. Not all values in the complex number plane; just for a specific subset.

If a function \(f\) is holomorphic, then the strong differentiability of it leads to another property. There's a *unique* extension to it that expands its domain. The expansion always produces the same value for all points that are within the domain of \(f\). It also produces exactly the same differentiability properties. But it's also defined on a larger domain than \(f\) was. It's essentially what \(f\) *would be* if its domain weren't so limited. If \(D\) is the domain of \(f\), then for any given domain, \(D'\), where \(D subset D'\), there's exactly one function with domain \(D'\) that's an analytic continuation of \(f\).

Computing analytic continuations is not easy. This is heavy enough already, without getting into the details. But the important thing to understand is that if we've got a function \(f\) with an interesting set of properties, we've got a method that might be able to give us a *new function* \(g\) that:

- Everywhere that \(f(s)\) was defined, \(f(s) = g(s)\).
- Everywhere that \(f(s)\) was differentiable, \(g(s)\) is also differentiable.
- Everywhere that \(f(s)\) could be computed as a sum of an infinite power series, \(g(s)\) can also be computed as a sum of an infinite power series.
- \(g(s)\) is defined in places where \(f(s)\) and the power series for \(f(s)\) is not.

So, getting back to the Riemann Zeta function: we don't have a proper closed form equation for zeta. What we have is the power series of the function that zeta is the analytic continuation of:

\[zeta(s) = sum_{n=1}^{infty} n^{-s}\]

If \(s=-1\), then the series for that function expands to:

\[sum_{n=1}^{infty} n^1 = 1 + 2 + 3 + 4 + 5 + ...\]

The power series is undefined at this point; the base function that we're using, that zeta is the analytic continuation of, is undefined at \(s=-1\). The power series is an *approximation* of the zeta function, which works over some specific range of values. But it's a *flawed* approximation. It's *wrong* about what happens at \(s=-1\). The approximation says that value at \(s=-1\) should be a non-converging infinite sum. It's *wrong* about that. The Riemann zeta function *is* defined at that point, even though the power series is not. If we use a different method for computing the value of the zeta function at \(s=-1\) - a method that doesn't produce an incorrect result! - the zeta function has the value \(-frac{1}{12}\) at \(s=-1\).

Note that this is a very different statement from saying that the sum of that power series is \(-frac{1}{12}\) at \(s=-1\). We're talking about fundamentally different functions! The Riemann zeta function at \(s=-1\) *does not* expand to the power series that we used to approximate it.

In physics, if you're working with some kind of system that's described by a power series, you can come across the power series that produces the sequence that looks like the sum of the natural numbers. If you do, and if you're working in the complex number plane, and you're working in a domain where that power series occurs, what you're actually using isn't really the power series - you're playing with the analytic zeta function, and that power series is a flawed approximation. It works most of the time, but if you use it in the wrong place, where that approximation doesn't work, you'll see the sum of the natural numbers. In that case, you get rid of that sum, and replace it with the correct value of the actual analytic function, *not* with the incorrect value of applying the power series where it won't work.

*Ok, so that warning at the top of the post? Entirely justified. I screwed up a fair bit at the end. The series that defines the value of the zeta function for some values, the series for which the Riemann zeta is the analytical continuation? It's not a power series. It's a series alright, but not a power series, and not the particular kind of series that defines a holomorphic or analytical function. *

*The underlying point, though, is still the same. That series (not power series, but series) is a partial definition of the Riemann zeta function. It's got a limited domain, where the Riemann zeta's domain doesn't have the same limits. The series definition still doesn't work at \(s=-1\). The series is still undefined at \(s=-1\). At \(s=-1\), the series expands to \(1 + 2 + 3 + 4 + 5 + 6 + ...\), which doesn't converge, and which doesn't add up to any finite value, -1/12 or otherwise. That series does not have a value at \(s=-1\). No matter what you do, that equation - the definition of that series - does not work at \(s=-1\). But the Riemann Zeta function is defined in places where that equation isn't. Riemann Zeta at \(s=-1\) is defined, and its value is \(-1/12\).*

*Despite my mistake, the important point is still that last sentence. The value of the Riemann zeta function at \(s=-1\) is not the sum of the set of natural numbers. The equation that produces the sequence doesn't work at \(s=-1\). The definition of the Riemann zeta function doesn't say that it should, or that the sum of the natural numbers is \(-1/12\). It just says that the first approximation of the Riemann zeta function for some, but not all values, is given by a particular infinite sum. In the places where that sum works, it gives the value of zeta; in places where that sum doesn't work, it doesn't.*

Attn @MarkCC: http://t.co/ijzQZpM2lm (Sum(NatNums)= -1/12 bullshit) h/t @NeuroPolarbear@BadAstronomer Shame on you, @Slate.

— Dr24hours (@Dr24hours) January 17, 2014

And indeed, he was right. Phil Plait the Bad Astronomer, of all people, got taken in by a bit of mathematical stupidity, which he credulously swallowed and chose to stupidly expand on.

Let's start with the argument from his video.

We'll consider three infinite series:

S_{1}= 1 - 1 + 1 - 1 + 1 - 1 + ... S_{2}= 1 - 2 + 3 - 4 + 5 - 6 + ... S_{3}= 1 + 2 + 3 + 4 + 5 + 6 + ...

S_{1} is something called Grandi's series. According to the video, taken to infinity, Grandi's series alternates between 0 and 1. So to get a value for the full series, you can just take the average - so we'll say that S_{1} = 1/2. *(Note, I'm not explaining the errors here - just repeating their argument.)*

Now, consider S_{2}. We're going to add S_{2} to itself. When we write it, we'll do a bit of offset:

1 - 2 + 3 - 4 + 5 - 6 + ... 1 - 2 + 3 - 4 + 5 + ... ============================== 1 - 1 + 1 - 1 + 1 - 1 + ...

So 2S_{2} = S_{1}; therefore S_{2} = S_{1}=2 = 1/4.

Now, let's look at what happens if we take the S_{3}, and subtract S_{2} from it:

1 + 2 + 3 + 4 + 5 + 6 + ... - [1 - 2 + 3 - 4 + 5 - 6 + ...] ================================ 0 + 4 + 0 + 8 + 0 + 12 + ... == 4(1 + 2 + 3 + ...)

So, S_{3} - S_{2} = 4S_{3}, and therefore 3S_{3} = -S_{2}, and S_{3}=-1/12.

So what's wrong here?

To begin with, S_{1} does *not* equal 1/2. S_{1} is a non-converging series. It doesn't converge to 1/2; it doesn't converge to *anything*. This isn't up for debate: it doesn't converge!

In the 19th century, a mathematician named Ernesto Cesaro came up with a way of *assigning* a value to this series. The assigned value is called the *Cesaro summation* or *Cesaro sum* of the series. The sum is defined as follows:

Let \(A = {a_1 + a_2 + a_3 + ...}\). In this series, \(s_k = Sigma_{n=1}^{k} a_n\). \(s_k\) is called the *kth partial sum* of A.

The series \(A\) is *Cesaro summable* if the average of its partial sums converges towards a value \(C(A) = lim_{n rightarrow infty} frac{1}{n}Sigma_{k=1}^{n} s_k\).

So - if you take the first 2 values of \(A\), and average them; and then the first three and average them, and the first 4 and average them, and so on - and *that* series converges towards a specific value, then the series is Cesaro summable.

Look at Grandi's series. It produces the partial sum averages of 1, 1/2, 2/3, 2/4, 3/5, 3/6, 4/7, 4/8, 5/9, 5/10, ... That series clearly converges towards 1/2. So Grandi's series is Cesaro summable, and its Cesaro sum value is 1/2.

The important thing to note here is that we are *not* saying that the Cesaro sum is *equal to* the series. We're saying that there's a way of assigning a measure to the series.

And there is the first huge, gaping, glaring problem with the video. They assert that the Cesaro sum of a series is equal to the series, which isn't true.

From there, they go on to start playing with the infinite series in sloppy algebraic ways, and using the Cesaro summation value in their infinite series algebra. This is, similarly, not a valid thing to do.

Just pull out that definition of the Cesaro summation from before, and look at the series of natural numbers. The partial sums for the natural numbers are 1, 3, 6, 10, 15, 21, ... Their averages are 1, 4/2, 10/3, 20/4, 35/5, 56/6, = 1, 2, 3 1/3, 5, 7, 9 1/3, ... That's not a converging series, which means that the series of natural numbers *does not* have a Cesaro sum.

What does that mean? It means that if we substitute the Cesaro sum for a series using equality, we get inconsistent results: we get one line of reasoning in which a the series of natural numbers has a Cesaro sum; a second line of reasoning in which the series of natural numbers does *not* have a Cesaro sum. *If* we assert that the Cesaro sum of a series is equal to the series, we've destroyed the consistency of our mathematical system.

Inconsistency is death in mathematics: any time you allow inconsistencies in a mathematical system, you get garbage: *any* statement becomes mathematically provable. Using the equality of an infinite series with its Cesaro sum, I can prove that 0=1, that the square root of 2 is a natural number, or that the moon is made of green cheese.

What makes this worse is that it's *obvious*. There is *no mechanism* in real numbers by which addition of positive numbers can *roll over* into negative. It doesn't matter that infinity is involved: you can't following a monotonically increasing trend, and wind up with something smaller than your starting point.

Someone as allegedly intelligent and educated as Phil Plait should know that.

]]>First of all, folks, this isn't an article, it's a press release from Blacklight. The Financial Post just printed it in their online press-release section. It's an un-edited release written by Blacklight.

There's nothing new here. I continue to think that this is a scam. But what *kind* of scam?

To find out, let's look at a couple of select quotes from this press release.

Using a proprietary water-based solid fuel confined by two electrodes of a SF-CIHT cell, and applying a current of 12,000 amps through the fuel, water ignites into an extraordinary flash of power. The fuel can be continuously fed into the electrodes to continuously output power. BlackLight has produced millions of watts of power in a volume that is one ten thousandths of a liter corresponding to a power density of over an astonishing 10 billion watts per liter. As a comparison, a liter of BlackLight power source can output as much power as a central power generation plant exceeding the entire power of the four former reactors of the Fukushima Daiichi nuclear plant, the site of one of the worst nuclear disasters in history.

One ten-thousandth of a liter of water produces millions of watts of power.

Sounds impressive, doesn't it? Oh, but wait... how do we measure energy density of a substance? Joules per liter, or something equivalent - that is, energy per volume. But Blacklight is quoting energy density as *watts* per liter.

The joule is a unit of energy. A joule is a shorthand for \(frac{text{kilogram}*text{meter}^2}{text{second}^2}\). Watts are a different unit, a measure of *power*, which is a shorthand for \(frac{text{kilogram}*text{meter}^2}{text{second}^3}\). A watt is, therefore, one joule/second.

They're quoting a rather peculiar unit there. I wonder why?

Our safe, non-polluting power-producing system catalytically converts the hydrogen of the H2O-based solid fuel into a non-polluting product, lower-energy state hydrogen called “Hydrino”, by allowing the electrons to fall to smaller radii around the nucleus. The energy release of H2O fuel, freely available in the humidity in the air, is one hundred times that of an equivalent amount of high-octane gasoline. The power is in the form of plasma, a supersonic expanding gaseous ionized physical state of the fuel comprising essentially positive ions and free electrons that can be converted directly to electricity using highly efficient magnetohydrodynamic converters. Simply replacing the consumed H2O regenerates the fuel. Using readily-available components, BlackLight has developed a system engineering design of an electric generator that is closed except for the addition of H2O fuel and generates ten million watts of electricity, enough to power ten thousand homes. Remarkably, the device is less than a cubic foot in volume. To protect its innovations and inventions, multiple worldwide patent applications have been filed on BlackLight’s proprietary technology.

Water, in the alleged hydrino reaction, produces 100 times the energy of high-octane gasoline.

Gasoline contains, on average, about 11.8 kWh/kg. A milliliter of gasoline weighs about 7/10ths of a gram, compared to the 1 gram weight of a milliter of water; therefore, a kilogram of gasoline should contain around 1400 milliliters. So, let's take 11.8kWh/kg, and convert that to an equivalent measure of energy per milliter: about 8 1/2 kWh/milliliter. How does that compare to hydrinos? Oh, wait... we can't convert those, now can we? Because they're using *power* density. And the *power density* of a substance depends not just on how much power you can extract, but how long it takes to extract it. Explosives have *fantastic* power density! Gasoline - particularly high octane gasoline - is formulated to try to burn as *slowly* as possible, because internal combustion engines are more efficient on a slower burn.

To bring just a bit of numbers into it, TNT has a *much* higher power density than gasoline. You can easily knock down buildings with TNT, because of the way that it emits all of its energy in one super short burst. But it's *energy density* is just 1/4th the energy density of gasoline.

Hmm. I wonder why Mills is using the power density?

Here's my guess. Mills has some bullshit process where he spikes his generator with 12000 amps, and gets a microsecond burst of energy out. If you can produce 100 joules from one milliliter in 1/1000th of a second, that's a power density of 100,000 joules per milliliter.

Suddenly, the amount of power that's being generated isn't so huge - and there, I would guess, is the key to Mills latest scam. If you're hitting your generating apparatus with 12,000 amperes of electric current, and you're producing microsecond burst of energy, it's going to be very easy to produce that energy by consuming something in the apparatus, without that consumption being obvious to an observer who isn't allowed to independently examine the apparatus in detail.

Now, what about the "independent verification"? Again, let's look at the press release.

“We at The ENSER Corporation have performed about thirty tests at our premises using BLP’s CIHT electrochemical cells of the type that were tested and reported by BLP in the Spring of 2012, and achieved the three specified goals,” said Dr. Ethirajulu Dayalan, Engineering Fellow, of The ENSER Corporation. “We independently validated BlackLight’s results offsite by an unrelated highly qualified third party. We confirmed that hydrino was the product of any excess electricity observed by three analytical tests on the cell products, and determined that BlackLight Power had achieved fifty times higher power density with stabilization of the electrodes from corrosion.” Dr. Terry Copeland, who managed product development for several electrochemical and energy companies including DuPont Company and Duracell added, “Dr. James Pugh (then Director of Technology at ENSER) and Dr. Ethirajulu Dayalan participated with me in the independent tests of CIHT cells at The ENSER Corporation’s Pinellas Park facility in Florida starting on November 28, 2012. We fabricated and tested CIHT cells capable of continuously producing net electrical output that confirmed the fifty-fold stable power density increase and hydrino as the product.”

Who is the ENSER corporation? They're an engineering consulting/staffing firm that's located in the same town as Blacklight's offices. So, pretty much, what we're seeing is that Mills hired his next door neighbor to provide a data-free testimonial promising that the hydrino generator really did work.

Real scientists, doing real work, don't pull nonsense like this. Mills has been promising a commercial product within a year for almost 25 years. In that time, he's filed multiple patents, some of which have already expired! And yet, he's never actually allowed an independent team to do a public, open test of his system. He's never provided any actual data about the system!

He and his team have claimed things like "We can't let people see it, it's secret". But *they're filing patents*. You don't get to keep a patent secret. A patent application, under US law, must contain: "a description of how to make and use the invention that must provide sufficient detail for a person skilled in the art (i.e., the relevant area of technology) to make and use the invention.". In other words, if the patents that Mills and friends filed are legally valid, they *must* contain enough information for an interested independent party to build a hydrino generator. But Mills won't let anyone examine his supposedly working generators. Why? It's not to keep a secret!

Finally, the question that a couple of people, including one reporter for WiredUK asked: If it's all a scam, why would Mills and company keep on making claims?

The answer is the oldest in the book: money.

In my email this morning, I got a new version of a 419 scam letter. It's from a guy who claims to be the nephew of Ariel Sharon. He claims that his uncle owned some farmland, including an extremely valuable grove of olive trees, in the occupied west bank. Now, he claims, the family wants to sell that land - but as Sharon's, they can't let their names get in to the news. So, he says, he wants to "sell" the land to me for a pittance, and then I can sell it for what it's really worth, and we'll split the profits.

When you read about people who've fallen for 419 scams, you find that the scammers don't ask for all of the money up front. They start off small: "There is a $500 fee for the transfer". When they get that, they show you some "evidence" in the form of an official-looking transfer-clearance recepit. But then they say that there's a new problem, and they need money to get around it. "We were preparing to transfer, but the clerk became suspicious; we need to bribe him!", "There's a new financial rule that you can't transfer sums greater that $10000 to someone without a Nigerian bank account containing at least $100,000". It's a continual process. They always show some kind of fake document at each step of the way. The fakes aren't particularly convincing unless you really want to be convinced, but they're enough to keep the money coming.

Mills appears to be operating in very much the same vein. He's getting investors to give him money, promising that whatever they invest, they'll get back manifold when he starts selling hydrino power generators! He promises they'll be on market within a year or two - five at most!

Then he comes up with either a demonstration, or the testimonial from his neighbor, or the self-publication of his book, or another press release talking about the newest version of his technology. It's *much better* than the old one! This time it's for real - just *look* at these amazing numbers! It's 10 billion watts per liter, a machine that fits on your desk can generate as much power as a nuclear power plant!! We just need some more money to fix that pesky problem with corrosion on the electrodes, and then we'll go to market, and you'll be rich, rich, rich!

It's been going on for almost 25 years, this constant cycle of press release/demo/testimonial every couple of years. (Seriously; in this post, I showed links to claims from 2009 claiming commercialization within 12 to 18 months; from 2005 claiming commercialization within months; and claims from 1999 claiming commercialization within a year.) But he always comes up with an excuse why those deadlines needed to be missed. And he always manages to find more investors, willing to hand over millions of dollars. As long as suckers are still willing to give him money, why *wouldn't* he keep on making claims?

Anyway, before getting started, I wanted to talk about a few things. First of all, why learn machine language? And then, just what the heck is the ARM thing anyway?

My answer might surprise you. Or, if you've been reading this blog for a while, it might not.

Let's start with the wrong reason. Most of the time, people say that you should learn machine language for speed: programming at the machine code level gets you right down to the hardware, eliminating any layers of junk that would slow you down. For example, one of the books that I bought to learn ARM assembly (Raspberry Pi Assembly Language RASPBIAN Beginners: Hands On Guide) said:

even the most efficient languages can be over 30 times

slower than their machine code equivalent, and that’s on a good

day!

This is pure, utter rubbish. I have no idea where he came up with that 30x figure, but it's got no relationship to reality. (It's a decent book, if a bit elementary in approach; this silly statement isn't representative of the book as a whole!)

In modern CPUs - and the ARM definitely does count as modern! - the fact is, for real world programs, writing code by hand in machine language will probably result in *slower* code!

If you're talking about writing a single small routine, humans can be very good at that, and they often *do* beat compilers. Butonce you get beyond that, and start looking at whole programs, any human advantage in machine language goes out the window. The constraints that actually affect performance have become incredibly complex - too complex for us to juggle effectively. We'll look at some of these in more detail, but I'll explain one example.

The CPU needs to fetch instructions from memory. But memory is *dead slow* compared to the CPU! In the best case, your CPU can execute a couple of instructions in the time it takes to fetch a single value from memory. This leads to an obvious problem: it can execute (or at least start executing) one instruction for each clock tick, but it takes several ticks to fetch an instruction!

To get around this, CPUs play a couple of tricks. Basically, they don't fetch single instructions, but instead grab entire blocks of instructions; and they start retrieving instructions before they're needed, so that by the time the CPU is ready to execute an instruction, it's already been fetched.

So the instruction-fetching hardware is constantly looking ahead, and fetching instructions so that they'll be ready when the CPU needs them. What happens when your code contains a conditional branch instruction?

The fetch hardware doesn't know whether the branch will be taken or not. It can make an educated guess by a process called branch prediction. But if it guesses wrong, then the CPU is stalled until the correct instructions can be fetched! So you want to make sure that your code is written so that the CPUs branch prediction hardware is more likely to guess correctly. Many of the tricks that humans use to hand-optimize code actually have the effect of confusing branch prediction! They shave off a couple of instructions, but by doing so, they also force the CPU to sit idle while it waits for instructions to be fetched. That branch prediction failure penalty frequently outweighs the cycles that they saved!

That's one simple example. There are many more, and they're much more complicated. And to write efficient code, you need to keep all of those in mind, and fully understand every tradeoff. That's incredibly hard, and no matter how smart you are, you'll probably blow it for large programs.

If not for efficiency, then why learn machine code? Because it's how your computer really works! You might never actually use it, but it's interesting and valuable to know what's happening under the covers. Think of it like your car: most of us will never actually modify the engine, but it's still good to understand how the engine and transmission work.

Your computer is an amazingly complex machine. It's literally got billions of tiny little parts, all working together in an intricate dance to do what you tell it to. Learning machine code gives you an idea of just how it does that. When you're programming in another language, understanding machine code lets you understand what your program is *really* doing under the covers. That's a useful and fascinating thing to know!

As I said, we're going to look at machine language coding on the

ARM processor. What is this ARM beast anyway?

It's probably not the CPU in your laptop. Most desktop and laptop computers today are based on a direct descendant of the first microprocessor: the Intel 4004.

Yes, seriously: the Intel CPUs that drive most PCs are, really, direct descendants of the first CPU designed for desktop calculators! That's not an insult to the intel CPUs, but rather a testament to the value of a good design: they've just kept on growing and enhancing. It's hard to see the resemblance unless you follow the design path, where each step follows directly on its predecessors.

The Intel 4004, released in 1971, was a 4-bit processor designed for use in calculators. Nifty chip, state of the art in 1971, but not exactly what we'd call flexible by modern standards. Even by the standards of the day, they recognized its limits. So following on its success, they created an 8-bit version, which they called the 8008. And then they extended the instruction set, and called the result the 8080. The 8080, in turn, yielded successors in the 8088 and 8086 (and the Z80, from a rival chipmaker).

The 8086 was the processor chosen by IBM for its newfangled personal computers. Chip designers kept making it better, producing the 80286, 386, Pentium, and so on - up to todays CPUs, like the Core i7 that drives my MacBook.

The ARM comes from a different design path. At the time that Intel was producing the 8008 and 8080, other companies were getting into the same game. From the PC perspective, the most important was the 6502, which

was used by the original Apple, Commodore, and BBC microcomputers. The

6502 was, incidentally, the first CPU that I learned to program!

The ARM isn't a descendant of the 6502, but it is a product of the 6502 based family of computers. In the early 1980s, the BBC decided to create an educational computer to promote computer literacy. They hired a company called Acorn to develop a computer for their program. Acorn developed a

beautiful little system that they called the BBC Micro.

The BBC micro was a huge success. Acorn wanted to capitalize on its success, and try to move it from the educational market to the business market. But the 6502 was underpowered for what they wanted to do. So they decided to add a companion processor: they'd have a computer which could still run all of the BBC Micro programs, but which could do fancy graphics and fast computation with this other processor.

In a typical tech-industry NIH (Not Invented Here) moment, they decided that none of the other commercially available CPUs were good enough, so they set out to design their own. They were impressed by the work done by the Berkeley RISC (Reduced Instruction Set Computer) project, and so they adopted the RISC principles, and designed their own CPU, which they called the Acorn RISC Microprocessor, or ARM.

The ARM design was absolutely gorgeous. It was simple but flexible

and powerful, able to operate on very low power and generating very little heat. It had lots of registers and an extremely simple instruction set, which made it a pleasure to program. Acorn built a lovely computer with a great operating system called RiscOS around the ARM, but it never really caught on. (If you'd like to try RiscOS, you can run it on your Raspberry Pi!)

But the ARM didn't disappear. Tt didn't catch on in the desktop computing world, but it rapidly took over the world of embedded devices. Everything from your cellphone to your dishwasher to your iPad are all running on ARM CPUs.

Just like the Intel family, the ARM has continued to evolve: the ARM family has gone through 8 major design changes, and dozens of smaller variations. They're no longer just produced by Acorn - the ARM design is maintained by a consortium, and ARM chips are now produced by dozens of different manufacturers - Motorola, Apple, Samsung, and many others.

Recently, they've even starting to expand even beyond embedded platforms: the Chromebook laptops are ARM based, and several companies are starting to market server boxes for datacenters that are ARM based! I'm looking forward to the day when I can buy a nice high-powered ARM laptop.

]]> Before I can answer what a compiler is, it's helpful to first answer a different question: what is a *program*?

And here we get to one of my pet peeves. The most common answer to that question is "a detailed step-by-step sequence of instructions". For example, here's what wikipedia says:

A computer program, or just a program, is a sequence of instructions, written to perform a specified task with a computer.

This is *wrong*.

Back when people first started to study the idea of computing devices, they talked about computing machines as devices that performed a single, specific task. If you think about a basic Turing machine, you normally define Turing machines that perform a single computation. They've got a built-in sequence of states, and a built in transition table - the machine can only perform one computation. It took one kind of input, and performed its computation on that input, producing its output.

Building up from these specific machines, they came up with the idea of a *universal* computing device. A universal computer was a computing machine whose input was a description of a *different* computing machine. By giving the universal machine different inputs, it could perform different computations.

The point of this diversion is that looking at this history tells us what a program really is: it's a description of a computing machine. Our computers are universal computing machines; they take programs as input to describe the computing machines we want them to emulate. What we're doing when we program is describing a computing machine that we'd like to create. Then we feed it into our universal computing machine, and it behaves as if we'd built a custom piece of hardware to do our computation!

The problem is, our computers are simultaneously very primitive and overwhelming complex. They can only work with data expressed in fixed-length sequences of on/off values; to do anything else, we need to find a way of expressing in terms of extremely simple operations on those on/off values. To make them operate efficiently, they've got a complex structure: many different kinds of storage (registers, l1 and l2 caches, addressable memory), complicated instruction sets, and a whole lot of tricky perfomance tweaks. It's *really* hard to program a computer in terms of its native instructions!

In fact, it's so hard to program in terms of native instructions that we just don't do it. What we do is write programs in terms of different machines. That's the point of a programming language.

Looked at this way, a program language is a way of describing computing machines. The difference between different programming languages is how they describe computing machines. A language like C describes von Neumann machines. Haskell describes machines that work via lambda calculus computations using something like a spineless G-machine. . Prolog describes machines that perform computations in terms of intuitionistic logical inference like a Warren Abstract Machine.

So finally, we can get to the point: what is a compiler? A compiler is a program that takes a description of a computing device defined in one way, and translates into the kind of machine description that can be used by our hardware. A programming language ets us ignore all of the complexities of how our actual hardware is built, and describe our computations in terms of a simple abstraction. A compiler takes that description, and turns it into the form that computer hardware can actually use.

For anyone who's read this far: I've gotten a few requests to talk about assembly language. I haven't programmed in assembly since the days of the Motorola 68000. This means that to do it, I'll need to learn something more up-to-date. Would you be more interested in seeing Intel, or ARM?

]]>Let's start with why booting is a question at all. When a computer turns on, what happens? What we're using to seeing is that the disk drive turns on and starts spinning, and the computer loads something from the disk.

The question is how does the computer know how to turn on the disk? As I said in the OS post, the CPU only really knows how work with memory. To talk to a disk drive, it needs to do some very specific things - write to certain memory locations, wait for things to happen. Basically, in order to turn on that disk drive and load the operating system, it needs to run a program. But how does it know what program to run?

I'm going to focus on how modern PCs work. Other computers have used/do use a similar process. The details vary, but the basic idea is the same.

A quick overview of the process:

- CPU startup.
- Run BIOS initialization
- Load bootloader
- Run bootloader
- Load and run OS.

As that list suggests, it's not a particularly simple process. We think of it as one step: turn on the computer, and it runs the OS. In fact, it's a complicated dance of many steps.

On the lowest level, it's all hardware. When you turn on a computer, some current gets sent to a clock. The clock is basically a quartz crystal; when you apply current to the crystal, it vibrates and produces a regular electrical pulse. That pulse is what drives the CPU. (When you talk about your computer's speed, you generally describe it in terms of the frequency of the clock pulse. For example, in the laptop that I'm using to write this post, I've got a 2.4 GHz processor: that means that the clock chip pulses 2.4 billion times per second.)

When the CPU gets a clock pulse, it executes an instruction from memory. It knows what instruction to execute because it's got a register (a special piece of memory built-in to the CPU) that tells it what instruction to execute. When the computer is turned on, that register is set to point at a specific location. Depending on the CPU, that might be 0, or it might be some other magic location; it doesn't matter: what matters is that the CPU is built so that when it's first turned on and it receives a clock pulse that starts it running, that register will always point at the same place.

The software part of the boot process starts there: the computer puts a chunk of read-only memory there - so when the computer turns on, there's a program sitting at that location, which the computer can run. On PCs, that program is called the BIOS (*Basic Input/Output System*).

The BIOS knows how to tell the hardware that operates your display to show text on the screen, and it knows how to read stuff on your disk drives. It doesn't know much beyond that. What it knows is extremely primitive. It doesn't understand things like filesystems - the filesystem is set up and controlled by the operating system, and different operating systems will set up filesystems in different ways. The BIOS can't do anything with a filesystem: it doesn't include any programming to tell it how to read a filesystem, and it can't ask the operating system to do it, because the OS hasn't loaded yet!

What the BIOS does is something similar to what the CPU did when it started up. The CPU knew to look in a special location in memory to find a program to run. The BIOS knows to look at a special section on a disk drive to find a program to run. Every disk has a special chunk of data on it called the *master boot record* (MBR). The MBR contains *another* program, called a *boot loader*. So the BIOS loads the boot loader, and then uses it to actually load the operating system.

This probably seems a bit weird. The computer starts up by looking in a specific location for a program to run (the BIOS), which loads something (the bootloader). The thing it loads (the bootloader) also just looks in a specific location for a program to run (the OS). Why the two layers?

Different operating systems are build differently, and the specific steps to actually load and run the OS are different. For example, on my laptop, I've can run two operating systems: MacOS, and Linux. On MacOS (aka Darwin), there's something called a microkernel that gets loaded. The microkernel is stored in a file named "mach_kernel" in the root directory of a type of filesystem called HFS. But in my installation of linux, the OS is stored in a file named "vmlinuz" in the root directory of a type of filesystem called EXT4. The BIOS doesn't know what operating system it's loading, and it doesn't know what filesystem the OS uses - and that means that it knows neither the name of the file to load, nor how to find that file.

The bootloader was set up by the operating system. It's specific to the operating system - you can think of it as part of the OS. So it knows what kind of filesystem it's going to look at, and how to find the OS in that filesystem.

So once the bootloader gets started, it knows how to load and run the operating system, and once it does that, your computer is up and running, and ready for you to use!

Of course, all of this is a simplified version of how it works. But for understanding the process, it's a reasonable approximation.

(To reply to commenters: I'll try to do a post like this about compilers when I have some time to write it up.)

]]>