Learning Via Bullshit December 6th, 2016

There's two ways to learn about something. One is to go in through the front door; you read the tutorial, you follow the instructions, and you progress forward through it.

But the trouble with that approach is that you'll only learn what they want you to learn. Everyone has an agenda, and what they directly tell you about their product only reflects that agenda. If you want to really understand the strengths and tradeoffs of a system, you need to push past that and approach from a different point of view.

I saw a fantastic talk by Vyacheslav Egorov about LuaJIT and dynamic languages. I'll quote a big old chunk of it here because I think it deserves quoting.

"What I learned from LuaJIT", Vyacheslav Egorov

Some people, after seeing this kind of compiled code, they will ask, "how does it do it?", and they will try to go and learn this by reading the source.

And I will tell you that I'm not the kind of person who can learn things by reading the source from the beginning to the end, mostly because it's very hard to find a beginning and end in a pile of C code.

So instead, I do strange things to the source, like here for example I add a key to the table which... I say p[1]=1 and I just thread it through the whole loop iteration. Then I look at what kind of code the LuaJIT generates, and discover that suddenly there is a whole pile of assembly coming out. There is a table allocation here and some GC steps, and whatnot.

So I like to ask "why does it not do something?", and learning by fixing bugs, or at least trying to understand why something doesn't work. And I think this is the much better learning technique.

Learning why things don't work is often the most valuable way to really learn the tradeoffs involved in a system. The trouble with many modern systems though, is that they pretend there are no tradeoffs, and that there are no flaws. Every system today pretends to be perfect.

The late, great Douglas Adams had this very pertinent quote on the matter:

Douglas Adams (from Mostly Harmless)

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair.

That's the trouble with a lot of systems. They pretend they're perfect, when they're not. Any engineering project contains both intentional tradeoffs, and unintentional fuckups. I've found over the years that by understanding the dark corners of things, you sometimes gain a greater understanding of the whole thing.

Don't look at what they are telling you, look at what they're not. If someone's trying to promote, oh I don't know, let's say JavaScript, and they say "look here at our benchmark, look how it can rival C in this test" -- don't listen to that. Don't listen to what they do say, listen to what they don't. Don't look at the best benchmarks, look at the worst. Find when something performs the most poorly, and use that to understand it.

Every large software system contains in equal proportions hubris, propaganda, and bullshit. And this is what we really need to study to understand the system. We need to get our hands dirty and poke around in the bullshit, as that's the only place where the truth isn't being hidden.

In Vyacheslav's case above of dynamic language performance, the worst of LuaJIT still holds up well against the best of most other JITs. It can even almost compete with C in some tight loops, but only in those tight loops.

Vyacheslav's example, in case you didn't watch it, shows how LuaJIT can get amazing code generation in one case, but then you add just one extra variable, and suddenly the whole JIT falls apart.

This case is a good example of unexpected consequences - do you really want the performance of your code to change by such a drastic amount just because of adding one extra field? C tends not to have that performance cliff-drop; the performance is directly related to your understanding of the system, which is why it's remained such a popular language for all these years.

Many modern systems don't have that property. Instead, the performance of your typical dynamic language is either a) slow always, or worse: b) slow sometimes. And the sometimes is the killer. The performance of a typical dynamic language is tied directly to Foo. Wait, what? What the hell is Foo? That's just it. It's an unknown. It's something you, as the user, can't predict. You can't design your software around it, or rely on it for critical operation, because it's an unknown that the JIT implementors try to pretend doesn't exist.

If you want to learn C, you might learn more by writing to element 201 of a 200-element array than by reading a thousand tutorials. Many C programmers remember that first time when they didn't initialize a variable and their program failed as a result. But they learned a lot about the language, and the machine, as a consequence of that.

The bugs reveal the truth of what the software is. One of the good things to be said about C++ is that many (not all!) of the bugs happen once when you try to compile something, not later when you run the code. If you're going to have unpredictability, it's better that the unpredictable parts happen to you while you're designing the software, not to some other guy at home using it.

It's fine for systems to have weaknesses, as long as people are honest and up-front about it, because that's how you can understand what a system is really doing. C doesn't pretend to be perfect, it's shitty and makes you take care about fiddly details a lot of the time. But it never hides these things from you. I see so many systems being developed that unfortunately take the approach Douglas Adams lamented -- pretend your system has no faults, and prevent the user from ever being able to understand those faults.

Written by Richard Mitton,

software engineer and travelling wizard.

Follow me on twitter: