codersnotes.com

Why Build?

Fri, 03 Jun 2022 07:00:00 -0000

"The Centre of the Universe is of course that marvellous land known as Fraggle Rock. It is thus called because it is A Rock and Fraggles Live There.

Fraggles are a noble race. Fearless, dignified, intellectual, they represent the very pinnacle of civilisation and culture. A Fraggle *is*, most assuredly, the best of all possible creatures."

A long time ago, there used to be a TV show called Fraggle Rock. It was about these strange scraggly creatures, Fraggles, that lived in caves below a lighthouse. The fantastic Fulton Mackay, better known from Porridge, played an old retired sea captain who looked after the lighthouse along with his faithful dog, Sprocket. Interestingly the show was localised with different segments filmed for different countries, so for example in the US version the setting was a quiet inventor's tinkering shed.

And there used to be these other funny little creatures too, Doozers, who were big into construction.

I dunno exactly what it was they built. It was some kind of scaffolding. Looked kinda like crystallized sugar. I seem to recall it was made of radishes. I don't recall the purpose of any of it all anyway, but it certainly seemed important to them if not to anyone else.

And they seemed happy, pushing the their little wheelbarrows around their building sites, while wearing adorable tiny hardhats and other beautifully puppeteered safety equipment. Everything the Fraggles did was chaotic and messy. But not the Doozers. They were the complete opposite: ordered and industrious. Stuff got planned, it got built, and the Doozers were content.

The Fraggles take life advice from a giant heap of trash

But, I suppose like any great engineering project, there were hazards. If you were a Doozer, working hard day and night, trying to erect some great piece of scaffolding or complete some worthy project, you just had to accept that every now and again, a Fraggle was going to come through and eat some of it.

Strangely they never seemed to be that bothered about it. I mean sure, there'd be a few grumbles here and there. But it never stopped them from getting on with things. It just seemed any large-scale construction effort required a certain Fraggle tax to be accounted for as part of its budget.

In one episode, the Fraggles stopped eating the Doozers creations. It was insensitive, they said, and they were made to utter a solemn vow not to destroy any more constructions. And so the Doozers could keep creating, until eventually their projects filled up the entire world. Once there was no space remaining for the Doozers to build in, they started packing up to go elsewhere. It seemed the Fraggles and the Doozers needed each other to co-exist.

The Doozers never really seemed to consider it much of a problem. Even though they'd spent ages working on something so clearly important to them, building something so beautiful, just to watch it get torn down. "Architecture is meant to be enjoyed," they'd say with a smile. I'm not sure what they meant by that either. To be honest, I feel the Fraggle analogy is possibly starting to fall apart at this point and I'm not sure if I'm the Fraggle or the Doozer in this essay. I suspect I may in fact be the 7ft tall grumpy giant that lives in the garden next to them. I'm starting to regret mentioning the Fraggles now.

The point is, people build. I don't know why. It seems important to them though. Presumably they have a process internally that at least they understand, even if we don't. And it makes them happy.

Which is nice. It's nice to be happy.

Bitcoin

Sun, 03 Oct 2021 07:00:00 -0000

Disclaimer

I don't know anything about Bitcoin and everything I say is usually wrong.

Imagine Alice grows turnips in a village. Everyone buys her turnips because hers are the best around. She trades them locally with money.

Then one day, she hears that there's a guy Bob in a village far away that wants to buy her turnips. She can't sell them there herself because it's too far, but hey old Eve there has a horse and cart, so you say "hey Eve, fill that cart up with turnips. Take them to the other village, sell them, you take 10% and I get the rest."

So off Eve goes to the other market. Eve sells your turnips on and takes a cut. Then after a while, because of all the new business other people start buying carts, and reselling your turnips elsewhere too.

After a while most of your business is exporting turnips. Then one day, you find out that Eve, and all the other resellers, have been forming a cartel to try and lie about how many turnips they actually sold, and you didn't get nearly the money you should have. And they got away with it because they all outnumber you, and no one saw it happen and no one seems to care.

This happens because the reselling happened far away, where you couldn't see it. The trade you do in person is all fine because you can check it, but once that horse and cart disappears over the horizon then the money kinda disappears with it you see.

What happened was Alice and Bob were trying to have a private transaction, but Eve somehow was able to eavesdrop on it, and siphon money out of the system for herself.

Cartels naturally arise in any system where all players cannot trade directly.

This always happens, because as long as there's a sink/source that can take money far enough so that no one person can see the whole system, scams can always be run. There'll always be an Eve, as long as you don't have direct contact with your customer. Local economies create middle men.

Bitcoin fixes this. With Bitcoin, Alice sells the turnips direct to Bob and Eve is simply hired as a delivery driver. Eve never handles the money. The reason Bitcoin is able to do this is because the blockchain creates, for the first time in history, a historical record of events that we can actually PROVE happened. History books can be inaccurate. If you show me a note saying you have $1000 you could pay me, I can go check with the bank to see if that's correct or not. But the bank could be in a cartel with Eve, you see. They could lie too. Or someone else down the chain could be. Because there's always an Eve, unless you remove all secrets from the world. Which Bitcoin does, because now we can make facts mathematically provable.

Bitcoin removes not just Eve but all middle men. From society. Forever. That's why it's cool.

What The Hell Was The Microsoft Network?

Wed, 29 Aug 2018 07:00:00 -0000

No, not that one. Or that other one. The letters "MSN" have meant so many things to so many people, a term that's been overloaded with a thousand different meanings. In the same way they throw the words "National Lampoon" onto any random movie to indicate that it claims to be a comedy, they slap the "MSN" label onto just about anything so that your poor brain goes "oh, a brand I've heard of" and you get fooled into thinking it must have a certain minimum level of goodness to it.

But I'm talking about the original MSN. Not the simple three-letter acronym we know today, but it's full title: The Microsoft Network.

The Microsoft Network was a long forgotten experiment that started back during the Windows 95 Preview Program somewhere around 1994. It was an exciting time for me as a young lad. Having a copy of Windows 95 a year before it came out made me feel Special, Hip, and Cool, three properties which were much sought after by teenage computer nerds. For perhaps the only time in its life, Windows was cool, and so was I.

Microsoft had already released all this fancy new tech with NT 3.1 a few years earlier, but no one noticed so after getting good n' drunk they had another crack at it. Windows 95 replaced the tired old segmented memory model with a glorious flat one where the pointers actually worked how C claimed they did. It dumped the bug-prone co-operative task switcher in favor of a slightly less bug-prone version of pre-emptive multitasking, meaning your machine wouldn't lock up due to busy loops and would have to find exciting new 32-bit ways to lock up instead. It had a ~~plagiarized~~ visionary new shell that presented the file system as a truly window-based navigation model. It was a pretty exciting time for PC owners back then. After all, these new directions brought the PC almost up to the same spec that other computers had been enjoying ten years ago.

But it wasn't just the technological advancements. The Chicago betas made you feel like you were part of a secret club. A group of people who could all hang out together in the super-secret clubhouse. But what was this clubhouse, you ask? Well sit down little Timmy, and I'll explain to you what The Microsoft Network actually was.

The delights of WinCIM accessing CompuServe.

You see, back in 1994 Facebook hadn't yet invented the Internet, so if you wanted to go online and let your computer transfer viruses to and from other computers, you had to use one of the many private networks. There were lots of them. CompuServe was the biggie. AOL was another, and in the UK they had Cix. Each of them had their own users, their own content and forums, and formed a delightful set of walled gardens that couldn't communicate with each other.

You dialled their phone number on your modem and talked directly to them. Each had their own shitty little piece of software you ran which presented their service, usually in some god-awful MDI-based thing which was barely more than a GDI rewrite of their old BBS interface. CompuServe felt like a dinosaur at the time. They didn't even give you the decency of a username on CompuServe, they gave you a number like you were Patrick McGoohan. It'd be like 71477,134 or some rubbish. They always started with a 7, had a bunch of octal digits, and had a comma in the middle. Why? Because that's how the login on PDP-10s worked, which is what CompuServe was powered on. Unbelievably, it seems some poor suckers may still have been paying for CompuServe as recently as last year.

Microsoft looked upon this gluttony of power and figured they wanted a piece of that sweet action. They had big plans for their own Microsoft Network. It was going to be the CompuServe killer. It would have all its own content, and features like being able to make online purchases safely, just like today's App Store allows you to throw money at your phone until Apple's pockets rip open under the weight of that sweet 30% cut.

Of course it was doomed from the start. The raw, unfiltered mess of the Internet was quickly becoming the New Cool Thing, and no-one wanted to sign up for yet another Stupid Proprietary Thing, of which there were an endless supply of. But Microsoft did one thing which got people using it (including myself), at least for a while. They gave it away free to people in the Super Secret Club. That's right, if you had the Windows 95 beta disc because you'd ~~pirated it from somewhere~~ paid the measly $20 bucks to sign up, you could get free access to The Microsoft Network.

I've got kinda sidetracked here waffling on about the history of dial-up BBS systems. What I really want to be concerned with is what made The Microsoft Network unique and interesting: the interface.

The big thing in Windows 95 was their new shell. They wanted everything to go through this. They had this vision of every object in the computer being represented as a shell object, so there would be a seamless intermix between files, documents, system components, you name it. They had this project called Cairo that was supposed to throw out that scruffy old file-based filesystem and bring in a shiny new Object Based File System instead. It never happened, so we'll never know exactly how it might have turned out. But the brave lads at MS didn't give up that easily and so the idea stayed on, admittedly without the tech to back it up, and the principles wormed their way into such glorious developments as The Microsoft Network.

Browsing The Microsoft Network.

And so The Microsoft Network wasn't a program you loaded like CompuServe. It was part of the OS, with folder icons that looked just like real folders. It was a kind of version of the Web where you could browse online data the same way you browsed your file system. This is what made it cool.

Everything was an object, and everything appeared in its own Explorer window. You could click on the icon for -- oh I don't know, what do humans talk about? Let's say Goat Tickling. You could click on the Goat Tickling BBS, and it'd open up a new window where people talked about Goat Tickling and each message was listed exactly like the "Details" view of a file system directory. From the user's perspective, there was no difference between this and a regular file window. The magic of the extensible shell blended the online world and your own desktop seamlessly, with no need for web browsers or forum software. It all just fell together under the same shared interface.

It's impossible to access this service today though, in case you wanted to take a nostalgic little look, as the servers than ran it have long since been euthanized. But if you're interested, you can watch this delightful and certainly not-at-all dated Microsoft promotional video, where Chandler and Rachel from Friends learn how to look at cat pictures on The Microsoft Network.

Get yer data right here in Explorer folks!

I loved The Microsoft Network back then, even though I knew it was terrible. It felt like there was the germ of a new idea here, something that hadn't been tried before on our poor little consumer computers. It was as if the data was suddenly free of the shackles of being displayed in a program. Data wasn't just a web page, or a program showing its own internal databases. The Microsoft Network made it look like the data was right there, and you could click it and drag it around! For a brief time, back in 1995, it felt like we were on the verge of the true object-oriented web, a world filled with open data and free from the tyranny of the walled gardens.

Of course, like all such things, it turned out to just be a big fat lie.

Microsoft's shell extensions are such a walled garden that even the rest of the OS can't bluff its way in. Open up your Control Panel right now, and right-click on one of the things (e.g. Keyboard). You can select "Create Shortcut", and it'll magically plop a shortcut to that item onto your Desktop.

(At least, it will if you've got the classic Control Panel enabled. If you've got it on the defaults it'll probably just give you a list of Helpful Tools, like "Find and fix problems", "Save backup copies of your files with File History", or "Search for solutions to your problem online because you sure as hell ain't gonna find them in here". The new Microsoft UI strategy was conceived when the product manager happened to call the IRS one day, and was faced with a phone menu of vague, unnavigable choices which circle around the thing you really want but never lead to it, causing him to listen for hours while occasionally pressing '5' to hear his choices again. He was impressed by the amount of time he'd managed to waste, and wishing to inflict the same suffering on his own users he immediately wrote a memo ordering everything in Windows to follow the same pattern. Any direct access to information was thrown out and hidden from the user at all costs, in case they might actually manage to change something. I hear in the next Windows build they're actually going to replace Control Panel with a copy of Eliza that's been hooked up to a magic 8-ball. No really, it's true. But I digress.)

One of these two is a real shortcut.

So we've got a shortcut now our our desktop. It looks just like a real shortcut, right? Just like any other file. Now right-click on that shortcut you've made, and see where it points to. What's that, I hear you cry? It points to Control Panel\All Control Panel Items\Keyboard! Where the hell is that? Pop open a command prompt and try and run "dir" on that location. You sure as hell won't get a directory listing of it. And why, you say, is the text displayed as a fixed label? Why can't I change the shortcut's target to a different file, like I can with a regular shortcut?

Unfortunately, like most things involving the Windows shell, these objects aren't real objects. Well no, I mean presumably some part of the system thinks they are. But they're not to us. They're not things you can just link to like a regular file. They're not like web pages where you can share the link with someone else. They live in their own weird little inaccessible namespace. Explorer knows how to peer over the fence into these little gardens, but no one else does. It's a facade, one designed to make you think you actually know how your computer works.

It was the kind of thing other OSs like Plan 9 were trying to do, except the Plan 9 guys didn't bollocks it up. See what happened was a senior director at Microsoft came in one night around 3am, loaded on blow and Jack Daniels, and shouted to the room "I've got it lads," in a big booming voice that woke up the people struggling to ship Cairo for the third time, "we'll implement it all in COM!" A great sigh rippled around but they knew in their hearts that was how things would be from now on, there was no point fighting it, and thus the great triumph known as the Windows Shell Namespace was born.

Microsoft's shell system made all the same mistakes CompuServe did, just in different ways. They thought numbers were a good way to name things, so you ended up with every shell object being given a 32-digit GUID. And they made it so their content was only linkable (if at all) via these same GUIDs, thus only those knights brave enough to fight the GUID monster may be allowed to enter. So while there might be a way for someone to copy a link to something they saw on The Microsoft Network, good luck actually being able to paste that link into another program, access it via the command line, or send it to a friend.

How Microsoft tell you to select a file.

This is what makes the whole COM system really really stupid. It's like they don't want you to be able to work with it. They want to make it hard for you to access and share data, because code that were to do something reckless like call fopen is code that could run on anyone's operating system, not just theirs. They want to keep you on the phone selecting numbered options, because if you stop doing that you might actually achieve something. And if you go around recklessly inventing new things on your own, you might not need Microsoft any more.

It used to be if you wanted the user to select a file, you'd call GetOpenFileName. That was it, one swift function call and you were done. Now they've deprecated that and are telling you you gotta call CoCreateInstance, have a CLSID_FileOpenDialog, you'll need an CLSCTX_INPROC_SERVER too and probably some IID_PPV_ARGS. Then you'll need a CDialogEventHandler_CreateInstance, and maybe end up with a IShellItem. You'll need about a dozen or so calls to SUCCEEDED(hr) too, in case one of the many internal details you never asked to know about happens to go wrong. Think I'm exaggerating? Check Microsoft's own documentation on it.

It seems insane, but it's the same trap Microsoft just keep falling into. Hell, just go look at any DirectX 12 code to see how they've not learned their lesson.

Phew, I'm exhausted after that long, unnecessary and possibly incoherent COM ramble. Got a little carried away there. Where were we? Ah yes, MSN. The Microsoft Network failed because it was just one more walled garden. One that was walled-off in two ways. One was from a business standpoint, of trying to keep users locked into it and locked off from the rest of the world. But the other was the technical standpoint. It wanted to make it look like the Network was something you had control over, something you could interact with just like you could with your own data. And it was a lovely idea, but that's all it ever was. The thing itself was never more than a facade. Meanwhile those Internet nerds were off pasting hyperlinks around, doing reckless things like setting up their own email addresses and adding new things to the network without having to ask anyone first. And despite The Microsoft Network making every object look so real on the outside that you could just squeeze it and watch the AddRefs ooze out, the Web's simple idea of giving every thing a unique readable location won by a mile. It wasn't the Object Oriented dream we might have hoped for, but Open Always Wins.

The trouble with a walled garden is that if the thing outside the garden is bigger and better than the thing inside the garden, then the wall only serves to keep new customers out rather than lock them in.

So what happened to The Microsoft Network? It barely lasted a year after Win95 shipped before being thrown out and replaced with MSN 2.0, which was a much more web-based experience and opened the doors for the cuddly familiar MSN.com we know and love(?) today, and all still have bookmarked in that IE installation we only switch to when we need to test something.

AOL and Compuserve didn't like it much either. After all, it was eating away at their already rocky marketshare. So they put on their big hat marked "antitrust" and manage to force Microsoft to wedge a big AOL icon on your desktop too. That way the consumer had a choice of which shitty-little service they wished to lock themselves into.

But at least the lessons of The Microsoft Network have been learnt. We don't have that kind of locked-off data any more, it's all fully integrated under a unified banner. At least now I can happily use fopen to read data from a website, right? And backup my email just by dragging folders from Gmail to a local disk. Right guys?

...right?

In Search Of The Lost Program

Thu, 11 Jan 2018 08:00:00 -0000

The Lost Chord.

Programmers just can't seem to stop making new things. You only have to look at how many different unit-test frameworks and build systems there are out there to see that. We're drawn to keep reinventing software that already exists, adding little improvements and new approaches. It's like a disease sometimes, it infects us, attaches to our brains while we sleep and whispers "Code me..."

It results in this explosion of software, all of which does exactly the same thing that all the other software does, except This One's Written In Rust, or This One's Got Python Support. They're not descendants of each other, which might build on previous code, but separate creations that begin anew. And while each may add one new idea, they tend to have forgotten at least one feature that the previous ones already did.

But why? Why can't we just make, say, the perfect build system, and then everyone could just use that? Why do we make so many programming languages and libraries all doing the same thing over and over again?

The Lost Chord

I heard there was a secret chord, that David played and it pleased the Lord.

There's this lovely myth in music of the Lost Chord. A chord better than any other, considered the Holy Grail of music. And having perhaps heard it clearly in a dream, a musician might spend his whole life circling around it trying to discover the notes that make it up, hoping to recreate the perfection he once briefly tasted.

The thing about a chord, for the non-musicians out there, is it's just a thing composed of at least 3 notes played together. So any old keyboard or guitar could play it, if only you could discover all the right notes to use. It's merely a matter of composition, so surely it would be easy to just hit all the right notes and let the chord ring out? So why haven't we found the secret chord yet?

The answer, somewhat obviously, is because it doesn't exist. At least, it doesn't exist in our world, it can't. The Moody Blues once released an entire album inspired by it, 1968's "In Search Of The Lost Chord". This snippet from there perhaps gives some insight:

The Word (The Moody Blues, 1968)

Two notes of the chord, that's our poor scope
But to reach the chord is our life's hope
And to name the chord is important to some
So they give it a word, and the word is...
Om.

The daydreaming musicians recognize something here. We get two notes, not the three needed. It's not just the we haven't found it, it's that it's unfindable by us. It's a mirage that hangs just on the horizon, but vanishes as you approach it. It's a Rubik's cube where you can get one side done, but then getting the second side done messes up the first one.

The Mercator Projection.

Perhaps computers are the same. Are we trying to solve impossible problems? Like a cartographer trying to produce a flat 2D map of the 3D earth, some tasks just aren't possible without at least one compromise. If you try and unpeel the Earth and flatten it out, something's going to end up looking the wrong shape. The Mercator projection, perhaps the most commonly-used world map view, makes Greenland look bigger than Australia, when in fact Australia is over 3 times bigger.

The Peirce Quincunxial Projection.

Or you could try the Peirce Quincunxial Projection, which has a lower overall distortion, but on the other hand splits Antarctica into four and makes Africa go funny.

The point is, there's no correct way to solve this particular problem, it's simply not possible. The only thing you can really do is to move into a higher dimension, where you can show a full 3D globe.

Maybe the inner truth of programming is something that can't be represented by our simpler world of variables and values, of bytes and registers. Are we just endlessly trying to find new ways to fit 9 bits into an 8-bit byte? Like the story of the blind men trying to describe an elephant, we're all just groping different parts of the perfect program we dreamt of, none of us able to get the full picture at once. Are we just circling around perfection, and never able to achieve it?

It seems to me sometimes that there will never be a One True build system, unit-test framework, or programming language. Every new programming language is destined to play just two notes of the chord, never the third. Perhaps programming is an impossible problem too, one we'll only ever truly overcome once we find a way to escape our current computing flatland and finally learn to move sideways.

Something Rotten In The Core

Tue, 24 Oct 2017 07:00:00 -0000

There's a key thought of UNIX philosophy which centers around the idea of linking programs together. You know, piping the output from grep into sed and then into sort, that kind of thing. It kinda works well, I guess. For text at least.

But one of the reasons it can work OK is because you, as the end-user writing this little script or command, have full knowledge of the pieces you're building it from. You understand grep, you understand sed, and if any of those pieces suddenly stop working then you can pick the piped command-line apart again and see why. It's a system made up of pieces that you have control over, and most importantly: all the pieces are exposed to you.

This idea spread throughout UNIX, but in ways it should never have. I'm referring here to the misfortune that is debugging.

You see, on UNIX there's GDB. That's the debugger. That's the only debugger. It's very old and has had a lot of work put into it, and as a result it usually works pretty well, at least in terms of functionality. But on every other metric you measure software by, it kinda sucks.

GDB's intended user interface.
(c) Arnold Reinhold, CC-BY-SA

Despite every computer made in the past 40 years having a graphical display, GDB lives in a parallel universe where the framebuffer was never invented and we all still use teletype printers. Teletypes work OK enough for getting output, but for interactive programs it just falls apart.

And so people who didn't want to deal with the pain of GDB invented the "GDB Wrapper" -- a separate piece of software that would show a nice user interface, but internally would call GDB to do the work.

We're not talking about calling out to a library here. We're talking about actually launching an instance of GDB, passing it commands, and parsing the results it prints out. And this is where we get led down a dangerous path.

APIs are hard to begin with. Good API design is very much an art, and it takes a lot of experience to come up with good ones. And the reason so many APIs are bad isn't because someone designed a bad API -- it's that they didn't even realize they were designing an API to begin with.

So much of our software world now is filled with wrappers -- programs that don't actually do the thing themselves, but 'outsource' their work to other programs. It's a stack of layers, and it's not a nice clean stack. I remember something Jeff Roberts once said to me -- the layers grind against each other, and you can feel each one chipping bits away as they collide.

I had the utter delight a few months back of trying to debug something using Qt Creator one day, except I couldn't. It just suddenly one morning refused to start debugging programs. It wouldn't say why of course, it would just sit there doing nothing.

Nothing had changed, at least so I thought. So why the failure? It turned out, after some experimentation, to be because Microsoft's symbol servers were down. That's right, a remote failure on someone else's part meant I couldn't debug locally.

Now of course errors happen in life, and are to be expected. But I didn't get an error. I didn't get a warning. I didn't get anything, except an unresponsive UI. And the reason for this, I think, is precisely because the wrapper wasn't fully aware of all the facts.

Jeff Goldblum said it best in that famous scene from Jurassic Park:

The problem with the scientific power you've used is it didn't require any discipline to attain it. You read what others had done and you took the next step. You didn't earn the knowledge yourselves, so you don't take the responsibility for it. You stood on the shoulders of geniuses to accomplish something as fast as you could, and before you knew what you had, you patented it, packages it, slapped in on a plastic lunch box, and now you want to sell it.

We've seen it hundreds of times in all kinds of software. Functions that return bool instead of an error code. Where did the precise error vanish to? Poof, it's gone! What used to be a useful error message became false, and if you're lucky you'll get a generic "Unexpected error" appearing on screen. And that's if your program is using a library. If it's calling out to a command-line worker, the most likely case is it won't get checked at all and will just get printed out into a log file you'll never find, and then never seen again.

Or networking software that just sits there spinning a cursor when something went wrong. So much user-facing network software is built on top of other programs, like ssh or rsync, and when those things fail they just don't know what to do. And so much of the problem is precisely because they're not using them as libraries, they're using them as command-line utilities. They're using these things that have ill-defined interfaces to begin with, and because it's all based on outputting text, the programmers think they can just look at examples of the output and figure out an API from that.

There's a quote from the great Douglas Adams which I'm sure I've used many times before, but it's just so incredibly apt for most of today's software:

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair.

You've all seen those cheap Chinese toys that look like a PlayStation, but inside its just a 6502 and 50 NES games. It's all fake, it's an illusion. It's a nice plastic finish on the outside, but if you were to open it up there's nothing in there. It's a rotten core, wrapped in layers of opaque complexity.

We're making systems that are fragile, because they're just glued on rather than bolted together. We're wrapping complex things up a wrappers that don't take the same responsibilities as the things they rely on. Like Homer's pecking bird in the Simpsons, they work just fine when everything is as expected, but when the slightest change in situation happens then everything breaks.

Little Lightmap Tricks

Tue, 10 Oct 2017 07:00:00 -0000

Just a quick post today to write down some lightmap lessons I've learned over the years, inspired by Ignacio Castaño's post on his iOS optimizations for The Witness. Many years ago I helped a little on the Wii port of "Call Of Duty: Modern Warfare", and it reminded me of the "fun" I had rewriting the lightmap system to fit into that. So I thought I'd write up some of the little tips and tricks I picked up along the way. Nothing special here, just some common mistakes to avoid.

Don't put gaps in!

You don't need pixel gaps between each chart.

It's surprisingly common to see people generating lightmaps with empty pixels in-between each chart. You don't need to do this! Put them right up against each other without any gaps. If you're worried that maybe you needed the black pixel to stop the charts bleeding into each other, then you're calculating your UVs wrong.

Squish blocks down

Many charts are all the same color -- squish them down.

If all the pixels in the chart are exactly the same color (or near-enough), you don't need to waste space storing all of them. Just shrink the entire chart down to a single pixel. And furthermore (see below), use the same single pixel for all of the shrunken charts.

Share UVs for identical charts.

You might imagine that the majority of a lightmap's space is taken up by nice big chunky pieces, like open terrain areas, or the sides of buildings. But in fact, you'll find that probably the majority of the lightmap space is occupied by a thousand little tiny shards of rubbish. Many of these only occupy a single 2x2 or 3x3 block on the lightmap. Now if you think about it, there's only so many 2x2 blocks that can exist in the world. So:

For each chart, search for all previous charts that are the same size and have the same contents (within a given pixel error). If you find any, throw the new chart away and simply re-use the UVs of the old one.

You might even find that this doesn't just work for little 2x2 blocks. If there's any instanced geometry in the level that's facing the same direction, then they'll often have identical lightmaps too. One example would be the side of an apartment block, with many balconies. Because the sun is a directional light, each balcony will have the same shadow cast onto it. So you can get chart re-use in a lot more places than you'd think.

Don't ruin your block compression

Don't let empty pixels ruin your compression for you.

You can get a big benefit by using a block compression scheme for your texture (DXT/BC/etc). But don't just compress your texture without thinking first! DXT stores two colors for each 4x4 block. The pixels you didn't write to on the lightmap will be an empty black. Do you really want to waste one of those colors on storing black? Of course you don't.

For each 4x4 block, fill in the unused pixels with one of the other pixels from the same block (doesn't really matter which).

Visual debugging

One of the really cool things I did was to write a little visual debugger -- a small command-line parameter that would pop open an OpenGL window showing the results of the bake. You could fly around and inspect the results, and if you found a strangely black triangle somewhere, you could click on it, and the program would re-run the lighting for that triangle and automatically break into the debugger at the right location. I highly recommend this.

The scheme I used

I tried a lot of texture compression schemes out, but here's the one I finally settled on. Bear in mind we were super-tight on memory, so I didn't want to use anything more than 4-5 bits per pixel really.

Like its parent 360/PS3 versions, the Wii port of COD:MW uses a non-HDR lighting engine where the sun's shadow is stored off as a separate lightmap channel. This allows partial time-of-day changes, special effects like lightning flashes, and also allows the total lighting brightness to exceed 1.0, allowing some overbrightening. I really wanted to keep that scheme for the Wii port, but I didn't want to use any more memory.

The shadow term is stored at full resolution, in the red channel of a DXT1 texture. This consists of the shadow visibility multiplied by the N.L term. This is then blended at runtime with the actual sun color.

The remaining non-sun light is stored a little differently. I split the secondary light into its separate components -- luminance and color. A separate RGB(565) texture stores the color, at quarter resolution. The luminance is stored at full resolution into the green channel of the above full-resolution texture. At runtime, we simply read both textures and multiply them together.

Now, you have to be careful about this. DXT compression relies on correlation between the channels -- you can't just throw any old data into the separate RGB channels and expect it to compress well.

But it turns out that for our purposes, this works great. It tends to be that each area of the world only has one strong light affecting each point (either the primary sun or one of the secondary lights), so the DXT compressor is free to EITHER:

a) If the shadow term is mostly constant, focus its efforts on the secondary luminance. Or, b) If the shadow term varies, focus on that and let the luminance suffer.

It tends to work out either way though. The human eye is drawn to brightness, so if the luminance gets bad compression then you're probably looking at an area that has strong, varying shadows, and that'll be the thing that stands out anyway. And of course, when you throw the diffuse texture on top it hides a lot of any remaining errors.

Because the quarter-res 16bpp texture has 1/16th the pixels of the high-res 4bpp texture, the total storage space is effectively 5 bits-per-pixel.

Summary

Well there you go. Nothing that special, but it's nice to write these things down sometimes so they don't get lost. Of course, it's all voxel GI these days so I don't expect this to be of much use, but there it is.

COD:MW Reflex -- Flashes of lightning were achieved by messing with the sun brightness at runtime.

Why Command And Vector Processors Rock

Thu, 07 Sep 2017 07:00:00 -0000

I had a Commodore Amiga as a kid. I'm told they were never especially popular in America, but in Europe they were everywhere. Well, sucks to be them I guess.

Image (c) Bill Bertram 2006, CC-BY-2.5

The Amiga was and still is, for its time, the best home computer ever made. It had a clean, powerful CPU architecture. It had an operating system that blended the best parts of CP/M and Unix with none of the unfriendly parts. It had 4-channel digital stereo sound playback in an era when a typical PC had a sound system that was either 0 or 1. It had proper multitasking, which took another ten years to finally arrive on Windows. It was also a completely open platform, unlike todays mobile, console, and store ecosystems.

But most importantly, it had Agnus.

Angus is the name for the main chip inside the Amiga. Its primary role is for graphics, but not as you'd think. The Amiga was unusual compared to most other home computers of the time. A typical early 80s computer has a CPU (usually Z80, 6502, or later the 68000). It would also have a video chip, which would read from the framebuffer RAM and either output pixels directly, or look up the bytes in a tilemap and output that instead. And that was generally as far as it went.

The Amiga, however, wasn't satisified with that. It had a CPU, sure. And it had a video chip ("Denise"), which read in bytes and spat out a video signal. But it didn't stop there. It had a custom-designed ASIC for each part of the machine. The entire hardware was built around these "custom chips" and the means to let them communicate.

Agnus is a kind of "ringmaster" chip. Its main component is the DMA controller (Direct Memory Access). This lets bytes be read from main memory and sent around to the various custom chips as needed. You can think of it as an asynchronous memcpy -- you give it an address and it'll either read or write bytes one at a time to/from the appropriate chip. It supported 25 different DMA channels at once for all the different parts of the machine that needed RAM access.

So what would you want to do with all this DMA? Let's look at one of the biggest examples -- the blitter.

The Blitter

The blitter was another part of the Agnus chip. It's operation was very simple. You'd give it three source pointers, one destination pointer, and a function ID. It would then read individual bits from memory (processing them 16 at a time), perform an arbitrary bitwise operation on them, and store the result out. You can think of it as a general-purpose bitwise arithmetic chip.

Given three bits (let's call them A,B, and C), there's exactly 8 different combinations that can result from these. So in order to specify your arithmetic function, you just need a lookup table of 8 result bits. This handily fits into a single byte.

Having this kind of bitwise arithmetic was important because, like many machines of the time, the Amiga used bitplanes as a format to store its graphics in.

Let's say you're using 32-color paletted mode. That's 5 bits per pixel you need to store. How do you store that? Well, you could use a byte per pixel, use 5 of the 8 bytes to store your data and leave the other 3 empty. But that's a hell of a waste. Instead, you store it as 5 individual bitplanes, each plane using one bit per pixel. (i.e. each byte contains one bit from eight different pixels)

Now let's apply our blitter to this. Imagine we want to draw a sprite on-screen. We've got it stored as 5 individual bitplanes, plus a sixth 'mask' bitplane to store the transparency. To get the blitter to draw this for us, we set up our three inputs:

A - The position on the framebuffer we want it at
B - Our sprite source data (1st bitplane)
C - Our sprite transparency mask

We'll need to repeat the whole thing a total of 5 times, one for each sprite bitplane, but that's fine (it's real quick!). The final piece of the puzzle is how we specify how to mix these three inputs. We can build a Boolean truth table to handle it.

Our goal is to use the transparency bit (C) to select EITHER (A) the background data (if C=0), or (B) the sprite data (if C=1). i.e. D = C ? B : A. To figure out our function ID, we just list out all eight cases:

A	B	C	D (output)
0	0	0	0
0	0	1	0
0	1	0	0
0	1	1	1
1	0	0	1
1	0	1	0
1	1	0	1
1	1	1	1

If we concatenate all the bits of D together, we get the value 0xD8. This is called a minterm, and it represents our bitwise operation in its entirety.

This "minterm" idea is a pretty powerful one. You can combine elements together to get any bitwise function you like. Want to XOR images? Sure. Want to just clear memory? That works too, just set the ID to 0x00 and ignore all the inputs. You'll still occasionally see systems that use this. Windows, for example, still uses it for its BitBlt function, although you'd never know that from reading the BitBlt documentation.

To actually program the blitter to do this, we simply write the three source address into three of its registers, write the function ID to another register, and then signal it to start. It'll run in the background while our CPU gets on with other things, and we can either check a flag to see if its finished, or get it to wake the CPU with an interrupt.

The Copper

Using the COPR to move the screen on each line

So far we've seen how Agnus contains the blitter functionality, and the DMA controller. But there's one more little secret hidden inside this chip, and that's the co-processor (aka "COPR", or simply "copper" to its friends).

The copper was a completely independent CPU that ran in parallel with the main one. It wasn't a Turing-complete, general-purpose CPU like the 68000. It only had three instructions. It didn't have its own memory or registers, but instead shared main memory (like everything else on the Amiga), and it could directly access many of the registers inside the custom chips.

The copper read its instructions via DMA. This meant you allocated some memory and filled in a program, called a "copper list", by writing 16-bit instructions into it. You then pointed the DMA at that address and started the program. The DMA would fetch each instruction and feed them into the copper.

So what could you do with a CPU that only has three instructions? Let's see what the instructions were:

MOVE reg, value
WAIT X, Y
SKIP X, Y

That's a pretty simple machine. We can load a value into a register, wait for the raster beam to hit a specific X/Y position, or skip the next instruction if the raster beam is past a specific X/Y position. Doesn't sound like much at first. If I wanted to write registers I could just do it on the main CPU, right? Why would I want to wait to write registers at a specific time? But there's some surprising effects you can get out of this simple mechanism.

What the copper let you do is to apply different properties to different parts of the screen. You can change the address of the framebuffer on a line-by-line basis, for example, to create parallax scrolling in Shadow Of The Beast (above).

Or you can change the address of the framebuffer at the halfway point on each line, to create a 2-player scrolling split-screen, like in Lemmings: (no other port of Lemmings could do this!)

Lemmings in two-player split mode

Or you could wait till a certain line and change both the color palette AND the frambuffer address, to create a wobbly water effect, like in Ugh:

Real-time water refraction in Ugh

And this wasn't just a trick for games, even the operating system made use of this. If they chose to, the Amiga allowed programs to have their own private screen with a different resolution. Windows suffered for years with this problem -- ever switch back from a DirectX program and see all your icons have moved around on the desktop? Not in Amiga-land. Here, two different programs on different screens can co-exist just by dragging the menu bar down:

Two screen resolutions on the same television.

These aren't just different windows you're seeing here. These are different screens, each running at a different resolution! The lower screen is 640x256 at 4 colors and the upper screen is 320x256 at 32 colors. Try that on a PC.

All these effects, and more, are achieved via the simple ability to change settings when you want to, rather than having them fixed at the start of the frame. It didn't require more power to be added to the system, just the flexibility to use the existing system in unusual ways.

If you want to see more creative uses like this, try the excellent codetapper.com which takes apart many Amiga games to see how they do things.

Hardware as a tool

The reason I'm writing all this isn't just to show off how cool the Amiga was. I want to show how its design principles allow new avenues to be opened up.

The Amiga hardware never said "this tool is for this purpose". It gave you a toolbox but let you decide what these things were to be used for. And it allowed each tool to interoperate with the others using common registers and common data formats.

I've presented the blitter here as a thing for processing graphics bitplanes, but it was really just a vector coprocessor for operating on boolean/bitwise data. It could be used for other tasks, and it was. The Amiga's floppy disks were formatted using MFM encoding, which is a kind of edge-based binary encoding. To decode it, you had to process the bit array from disk and look for 0-1 transitions. The blitter provides an ideal tool for doing this with, and the OS made use of it for exactly that. The same kind of tasks we might use a compute kernel for today, perhaps.

The copper, while seeming to be a very simple processor, effectively acted as an amplifier for the power contained within the other chips. It could be viewed perhaps as a metaprocessor -- not doing the work itself but controlling the work of others.

This combination of a vector processor and a control chip is a powerful one. It's so powerful in fact, that the machine you're reading this on now has the same architecture. A modern GPU consists of three parts:

Part a) is a thing that can draw triangles. There's usually special-purpose hardware for doing this. There was a time a few years ago when this was what we thought of as the GPU, but we're seeing less and less of that every year. Games now are doing voxel ray-tracing, and people are using GPUs for lots of things other than just rendering.
Part b) is the vector processor, a unit that reads data and runs functions using it. Ours are much more powerful than the old blitter though. We can do full floating-point operations on ours, not just bitwise ops. But it's a more advanced version of the same principle -- a program that operates on many values at once rather than just one.
Part c) is the command processor. A modern GPU has a chip that reads instructions from the host CPU, decodes the various draw calls, state changes, etc, and then issues work to the vector processor (for compute kernels). Or, when using rendering APIs, it sends work to the triangle drawer which in turn sends work to the vector processor (either to shade vertices or pixels).

Right now we're a little stuck, however. A modern GPU lets you use its triangle drawer (via OpenGL perhaps), and it lets you use its vector processor (via CUDA perhaps). But the one thing it does not do, on almost any platform (even most consoles), is to let you use the command processor. About the only one I've ever seen that did give you that kind of access was the PlayStation 2, something I'll no doubt write about in a future article.

Block diagram of a typical GPU. Spot the tiny, almost unnoticed 'host interface', where the whole thing is overseen.

You see, the Amiga documented its command processor. The designers wanted you to write programs that ran on it. They wanted you to use it for doing all sorts of clever things. They recognized that the power to operate the underlying horsepower directly was something that could amplify the capabilities of a system way past the limits of its original design.

But on Direct3D, or OpenGL, all you can do is call DrawIndexedPrimitive etc. and let it do things on your behalf. You can't build your own copper lists like you used to on the Amiga. Some APIs let you make a command buffer, but they're usually just recording API calls into it. You can't program it with your own logic, or your own algorithms. The 3D driver has the power to do this, but you don't.

The Amiga was a good machine not because of what it was designed to do but because the designers intentionally gave you the flexibility to do things they'd never designed it to do.

The old COPR chip only had three instructions and couldn't do much by itself, but you could use it to make the rest of the system sing. I'm sure the command processors in modern desktops have a much more advanced processor -- I'd love to see what we could do with them given the chance.

The Danger Of Opinions

Sun, 03 Sep 2017 07:00:00 -0000

Disclaimer

Warning: this post may contain opinions. If you are allergic to opinions, please try the associated reddit thread instead where you will be safe from them.

There's two schools of thought about how we should treat computing. One thinks programming should be about writing things to best reflect the truth of how they will be executed (C, C++, Pascal, Go, Rust, etc). The other thinks we should writing against some universal ideal, and the computer should just deal with that for us (Python, Ruby, Lisp, JavaScript, etc).

For years, MIT taught their SICP course using Scheme. And you know the weird thing about that? No computers involved at all. It was all just done on a whiteboard, using symbols and parenthesis. No registers, no instructions, no memory. It showed you what computing really was -- an abstract concept that isn't tied to any implementation. The idea that computing doesn't actually require a computer is somewhat alien to many native C++ programmers.

Then you've also got the engineering crowd. People like me whose first exposure to computers was that 8-bit home computer your Dad brought home one day in the 80s. I didn't grow up in a world of evaluation, expressions, and functions. The computer I had knew only about bytes, and how to move them about in memory. It was always about how things get done. What use were abstract concepts in a world where you needed to do specific things in order to see the results?

These two groups often fall under the banners of "static" and "dynamic" typing, and it's perhaps no coincidence. Static typing tries to tell the computer exactly what needs to be done, at the expense of moving the program further away from the abstract description. Dynamic typing expects the computer to figure things out, so that the human can just write things in a nice clean manner.

Which leads on to the ultimate question of programming: Should programs be written for the benefit of humans or for computers? Exactly whom are we trying to make it easier for?

It's a simple point but you see the repercussions of this appearing everywhere, hundreds of little design decisions that push programming further into two camps. UNIX, for example, demands a case-sensitive file system, on the grounds that the file system can be done more efficiently if its only concerned with matching bytestrings. Windows says that being able to create two files with the same name but different cases isn't useful to humans, and is only confusing. Why should humans have to keep track of where the capitals were placed, and why should auto-complete suddenly stop working because I forgot to hold SHIFT?

So which is right? Which is better? Should computers adapt to us or should we adapt to computers?

There's a book I love by Robert M. Pirsig called "Zen and the Art of Motorcycle Maintenance", which contains over 500 pages of pseudo-philosophical bullshit (oh who am I kidding, I still love it) centered around the idea of "quality". It's got this lovely disclaimer at the front where the author notes that the book doesn't really have anything to do with Zen Buddhism, and "it's not very factual on motorcycles either."

The central pillar of the book is what he calls the "classical" vs. "romantic" ideals. The classical, he says, is concerned with what something is and how it works. The classical viewpoint wants to know how their motorcycle works, how to recognize where that weird knocking noise is coming from, and wants to tune their engine to keep it running well.

The romantic viewpoint is instead concerned with how we see something. It's not important how something works, but how we see and use it. The romantic person wants to use their motorcycle to drive along beautiful mountain roads, and use it to get to far-away places.

The classical person sees a rainbow and wonders how it formed, and how the rain might reflect the sun like that. The romantic person sees a rainbow and wants to show others, and paint a picture of it.

This two-sided philosophy is found throughout the whole of human life, and especially in computers. One of the things I love about computer programming is that it's one of those areas where we actually get to use both at the same time, even within the same program. It's what makes a game developer want to be an artist or a programmer. And yet the game needs both to work.

So which is better? Well unsurprisingly, neither. You need both viewpoints, sometimes at the same time. And that's the weird part. How can two opposing ideas both be correct?

But they can.

I remember once talking to an artist friend of mine. We were talking about computer animation, and the subject of IK (inverse kinematics) came up. What puzzled me is that he wasn't a big fan of it. Now to me as a programmer, it seemed an obvious choice. Of course IK is a better way to do things. You just tell the arm or leg or whatever where you want it to go, and it automatically moves the elbows and knees and such for you. So surely that's less work, and therefore better?

But he explained things to me. You remember when as a kid, you drew stick men? Well in my mind that's how the human skeleton looked. But he explained about "clavicles", something I barely even knew existed but in fact drive the whole upper armature. And he explained how the best algorithm in the world isn't going to give you the results you want if there's more than one solution available. What had seemed a simple "pointing a finger" problem was unfolding into a world where you had to try and teach the computer how to be an artist. It slowly dawned on me that I didn't have the full experience of the problems he was describing, and couldn't make a case to argue back with.

It's weird when you suddenly realize that there's a separate world out there that you're not an expert in. It certainly changed my outlook on things. I think there's a lot of programmers who still haven't had that moment, and still live in a world where they believe they know everything.

Did that make me wrong about IK? Well, no. It's still useful. Did that make him right? Maybe, maybe not. But what it shows is that you can't have a discussion one way or the other unless you actually know a little about the other viewpoint.

Issues aren't black and white. And sometimes you can have two opposing viewpoints that are both valid. Programmers hate this. It's very un-pythonic.

Did you ever have an experience where someone you'd greatly respected suddenly said something you strongly disagreed with? Does that invalidate all the things they said that you did like? Do you stop talking to your best friend because you found out he voted Republican?

Complex issues can't just be simplified down into tribal arguments of us-vs-them, or solved by just shouting at the other person until they go away. We need to get over this cultural idea we have where anyone who disagrees with us is literally Hitler. It's OK to disagree with someone. And just because we disagree with someone doesn't make them wrong. It's possible for two people to disagree and yet they both still are correct. It's so common, especially in the media, for someone who changes their mind later on to be labeled a hypocrite. "But last year," they cry, "you said this thing. Now you're saying the other!". Yet the ability to change our mind is the most important thing we have. An opinion that is rock-solidly fixed in place is just tribal politics. Opinions should be swayable via convincing arguments.

On the one hand it's easy to look at things like the direction the C++ committee is taking and laugh; C++ has become an insane language that no one person has a hope of understanding. But they definitely seem to be heading towards a destination, perhaps if only by accident. What are they actually trying to achieve? A language where you can do anything, but only at compile time? Perhaps Python is a cleaner approach, by pushing all problems to runtime, but even they're now starting to realize that maybe type annotations are a useful feature.

But on the other hand, you've got languages like JavaScript, where they've just this year invented co-routines and are now treating it like the second-coming. And compiling source code into a binary form is still considered a great unsolved problem in computer science. And yet... there's still this "and yet" that hangs around with it. I mean can you imagine trying to write web-apps in C? At least JavaScript has a string type.

But we need both. The idea of the "one true" anything is bullshit. There will always be different sides, with different ideals. And that's fine. We need that. But what we don't need is us-vs-them. People need more exposure to different ideas. Programmers need to try out different languages. Webdevs could learn a hell of a lot from trying to write a Z80 program. And a lot of GPU shader guys could learn a thing or two from watching how Bob Ross can manage to paint a tree without knowing how sub-surface scattering works. Because let me tell you, however you've been doing things so far, there's a whole different approach that other people have been successfully using that you have no idea about.

I dunno where I'm going with all this. I just figured I'd write some of these rambling thoughts down, although putting your thoughts into words can get you fired these days. Probably best just to stay absolutely quiet and avoid doing anything that may or may not cause two opinions to form. We do have to be careful, you know. Sometimes we can create a difference of opinion so vast that the universe has no option but to bifurcate in order to accept both.

Disassembling Jak & Daxter

Thu, 12 Jan 2017 08:00:00 -0000

Disclaimer

I don't work at Naughty Dog, and I don't have any secret knowledge of Jak & Daxter, except what I figured out myself from the disc. So a lot of this may well be wrong. Take a pinch of salt with it.

The massive world of Jak & Daxter

I've always had a fascination with Jak & Daxter ever since it came out, way back in 2001 on the PlayStation 2. It was one of the first 3D games I'd actually seen run at 60Hz. I was used to PC games like Quake which were software rendered, or games like Mario 64 which I played (and completed!) on UltraHLE at a woefully poor framerate. But Jak & Daxter, hereafter referred to as J&D, blew me away.

Here was a game running at a silky-smooth 60Hz, never missing a beat, and somehow managing to draw more polygons than any other game out there. A game where you could see and explore a vast world with no boundaries. No load screens, no pauses, no LOD pops. It all just fit together perfectly. I had to know more.

The J&D engine may possibly be one of the greatest engines ever made. Of course, because it's not open source, it doesn't get the recognition that more famous engines like Quake get. There's not much information out there about it, but there's a few snippets here and there. The most important for you to read would be Stephen White's 2003 GDC talk, "The Technology Of Jak & Daxter", which I can't find a link to any more but I've mirrored here:

Link

The Technology Of Jak & Daxter, Stephen White (GDC 2003)

The audio contains some additional Q&A not in the slides.

There's a lot to talk about, but today I'm going to ignore all the cool graphics tech and such, because I want to focus on the most amazing element: GOAL

GOAL, standing for "Game Oriented Assembly Lisp", is the programming language J&D is written in. All of it. I want to really nail this point home here -- some people think it's some kind of bytecoded scripting language. It isn't. GOAL is a fully natively-compiled general purpose programming language, and the whole game is written in it.

It's a hell of a thing, when you think about it. To create your own programming language, from scratch, and then write the entire game engine in it. I don't know of any other games out there that do that.

The only part of the game not written in GOAL is the loader/linker, which is written in C++. This is the equivalent of their DLL loader; it's just a simple stub program to load the rest. And this is what this article will be picking apart, as it forms the gateway to everything else.

Because GOAL has its origins in Lisp, one of it's hallmarks is that symbols are considered a first-class datatype. There is no offline linker, everything is dynamically linked. Therefore the symbol table must be present in the runtime data. What this means is that if we can understand the data format, we can get a complete symbolic disassembly of the whole game.

If you want to follow along at home, you'll need a copy of the PS2 J&D disc or ISO, and the source code I've posted at the end here.

The loader

Let's start at the loader. This is the first thing that loads, and is just a regular PS2 ELF compiled with gcc. For whatever reason (lack of attention possibly), Naughty Dog forgot to strip the ELF symbols from this file, which means we can open it up in any MIPS disassembler and follow its logic through.

The function of the loader is simple. It sits waiting, and when new data arrives from wherever, it allocates space on the heap and copies the data there. It then relocates the pointers to the new address, and patches up any references to the global symbol table. Once this is done the data is ready to go. That's basically it; there's no VM or interpreter here.

The loader generally tracks 3 different memory heaps -- the common heap, and two level heaps. The game has two levels loaded at any one time, each getting their own self-contained heap (this is what allows you to seamlessly walk from one to another). The common heap contains the core engine code and data. The loader just uses a simple bump allocator to throw new data on the end. Once the level is finished with, it just throws out the entire heap and starts again.

So where does the loader get its data from? First we need to examine the file formats used here. If you look on the disc, the main files used here are called "CGO" and "DGO". There's no actual difference between them, I presume the names just reflect the usage (code/data?). It tends to be that CGO files are loaded into the common heap, and DGO files are loaded into the level heaps. However, don't be confused by the names -- just because they imply code and data separately, the contents are in fact mixed and can contain both code and data interchangeably. In fact, as we'll see later, GOAL doesn't really distinguish the two concepts.

The DGO format is real simple. There's a simple header, and then it's just a bunch of binary files concatenated together. It's closer to a ZIP archive than it is to something like ELF. The DGO is just a way of delivering binary blobs one at a time.

In the retail version, the IOP (the PS2s input/output processor) delivers the DGO file as-is into the loader when requested. In the development builds however, they also listen on a network port for incoming binaries. This is how their development process works -- they can compile new source files on their development machine and stream them straight into the PS2 while it's running. The target PS2 doesn't even know where the files came from, it just loads them in like any other.

It's not really fair to call this "hotloading", it's really just "loading", as updating the game with new code/data is no different than loading it in the first place. This is made possible by the dynamic symbol table in the linker, which we'll look at later in the top-level section discussion.

So, we can easily write a little program to pop open a DGO and extract the files within. They don't have an official extension but I'll refer to them as ".go.bin", assuming they were compiled from a ".go" originally. But what do these binaries exactly contain?

A .go.bin is basically the equivalent of a .OBJ file from a normal C++ compiler. It contains 3 sections, as well as relocation tables to patch them with:

the top-level segment
the main segment
the debug segment

These are loaded in-place by the loader, which makes loading very efficient. The loader does not have to actually parse the data or even know what it is. As far as it's concerned, we're just loading 3 big chunks of data in and then patching them up afterwards.

The top-level segment

The top-level segment contains everything at global scope from the source file. Because this only executes once, the loader loads it, runs it, and then throws it straight away. Its main purpose is to register functions into the global symbol table. Because functions are first-class objects in GOAL (like most other things), they're just function pointers. So registering a function is simply a matter of storing a pointer to it in the symbol table.

This means that every single function call in J&D goes through a pointer (i.e. is a virtual function). People always used to tell me this was bad on the PS2 as, oh I don't know, DCACHE misses or something. Well seeing as J&D is one of the best performing games on the system (or any system), it doesn't seem to have done them any harm.

Here's an example excerpt of a top-level segment:

; eye

;==================================================================
    .segment top-level-segment
;==================================================================


;------------------------------------------------------------------
    .type function
goal-entry:
    lui v1, L35
    or v1, v1, L35
    sw v1, *eye-work*(s7)
    lui v1, L26
    or v1, v1, L26
    sw v1, render-eyes(s7)
    lui v1, L12
    or v1, v1, L12
    sw v1, update-eyes(s7)
    lui v1, L11
    or v1, v1, L11
    sw v1, get-eye-block(s7)
    lui v1, L10
    or v1, v1, L10
    sw v1, convert-eye-data(s7)
    lui v1, L1
    or v0, v1, L1
    sw v0, merc-eye-anim(s7)
    jr ra
    daddu sp, sp, r0

In this example, what we're doing is loading the address of a function (e.g. render-eyes) and storing it at an offset into the global symbol table (register s7). Note that GOAL symbol names can happily contain dashes, asterisks, and other Lispisms. GOAL keeps a pointer to the symbol table in register S7 at all times, and all global variables/functions are always referenced via this.

So when this file is loaded into the system, it simply registers the existence of 6 named global objects, and then returns.

I'll give you a brief example of how something like this might be generated. Imagine the following C function:

int my_function(int x, int y) { return x+y; }

In GOAL, you might write this in a more Lispy syntax: (I guessed at the syntax from snippets I've seen online, don't peek too closely)

(defun my_function ((x int) (y int))
    (return (+ x y)))

Looks similar on first inspection, but the key difference is defun (short in Lisp for "define function"). This is a macro. We're not actually defining the function ourselves here, what we're doing is passing our code into the "define function" macro. The macro will run at compile time and rewrite our code, and we'll end up with something like this:

// top-level segment:
symbols["my_function"] = &L1; // assign fn ptr to symbol
// main segment:
int L1(int x, int y) { return x+y; }

So effectively, the top-level segment is just doing at runtime the work that most compilers do at compile time. Instead of writing symbol tables into an ELF file, we set it up ourselves upon loading.

The main segment

The main segment contains the real work. Because this is just a binary chunk, it could contain anything. Typically it tends to be either a set of function bodies, or some data structures containing art data. Either way, the top-level segment will refer to these objects and patch them into the running system.

For me, this is one of the most beautiful aspects of the GOAL system; the mixing of code and data. The system just doesn't care. There's no code in GOAL to, for example, parse an XML file into a structure. It's just not needed. If you want to fill in data, you just embed that data right into the binary. I wrote about this way back in my old article, The Joy Of INCBIN. GOAL takes this idea to its extreme. Textures, models, etc, are all processed by offline tools which write out .go files, which the compiler then just packages up like any other data set.

As far as GOAL is concerned, a function body is just an object like any other, no different from an array or a struct. Each allocated GOAL object has a 4-byte header in front of it which contains a pointer to its type, used for both debugging and runtime polymorphism. Note that I didn't say "type debugging information", I said "type". This is because in GOAL, a type is itself an object as well. You can even create new types at runtime if you like.

The debug segment

Lastly we have the debug segment. The game doesn't normally load this in a retail build, it just skips over it. During development however, it contains assorted extra information used for debugging. In a regular ELF binary, you might expect to find encoded DWARF data or some other format that described all the symbols and types within. Well that's not how GOAL works.

GOAL can't list the symbols within the file because there aren't any. Remember, we patched the symbols into the system ourselves in the top-level segment? The code itself decides what to assign to what symbol at runtime, not compile time.

It also can't list the type information, because types are a purely runtime construct too. That's not to say that this is a dynamically-typed language like Lisp or Python, because it isn't. The compiler emits static native code assuming the layout of types, just like a C++ compiler would. The key to understanding GOAL is to realize that types are just something the runtime lets you create at will, and the compiler is just making use of that facility in advance.

So if there's no type info here and no symbol table here, what is here? Well, it tends to just contain a bunch of functions (probably compiler-generated) that know how to print out and inspect the contents of the types the compiler made.

GOAL never had a source-line debugger, as far as I know, although there's no reason it couldn't given some effort to write one. (edit: wrong, see here) So it seems like the main mechanism Naughty Dog used to debug with was to pull up a console and enter commands to print and inspect various data themselves. (Remember, the system can compile a new single-line function and download it straight into the loader at any time! This gives you a free debug console.)

That's not actually as bad as it sounds; bear in mind this is a fully live-editable system, so if there's a function you want to investigate you can easily just patch in a printf while the game's running and mess around with it.

Getting a disassembly

So that's pretty much all there is to loading compiled GO files. You can disassemble the ELF loader to figure out the exact file format, and what with having the ELF symbols for the loader available, it's not that hard to replicate the function of the loader ourselves.

So I did.

Below is a small program that loads DGOs, disassembles them back into MIPS assembly, and writes it all out. The assembly format isn't quite standard, as there's a few things the .go.bin format needs that a regular assembler wouldn't provide. In particular it would need to understand how to write the symbol names used into the relocation table, and how to prefix each stored object with a pointer to its type.

I suspect most people won't be that interested to actually pick through it all themselves, but still, here it is. Let me know if you find any of it interesting!

https://github.com/rmitton/goaldis - My command-line disassembler

Converging Towards Disneyland

Mon, 19 Dec 2016 08:00:00 -0000

Warning

Contains (very minor) spoilers for The Witness.

Jonathan Blow's 2016 game The Witness received high praise from most reviewers. Personally, I loved the island and the environmental puzzles, but I hated all the stupid little NP-hard slidey-line puzzles. I'm not here to moan about that today though. Let's talk about layout.

The island itself is an absolute joy to explore. No corner of it is wasted, nothing is filler. Everywhere you look you'll find some new little piece of entertainment. It's designed for you to be able to wander at will and always find new things.

Yet as I uncovered the whole island, something about it struck me as familiar. A feeling I'd seen something like this before. It wasn't until I found the in-game map that I realized what it was.

Imagine you start from the cove at the bottom. You disembark the boat, and venture forth to the buildings you see. The island has a town in the middle which forms the main hub of the island. North from there leads to the castle centerpiece, or you can wander out to any of the puzzle areas that fill up the rest of the island. Surrounding the island is a transport system, the boat, to let you move around quickly. The game culminates within the giant mountain to the south-east.

Now the game itself doesn't actually start you via that route -- instead you start in the bottom-left, where you're encouraged (but not strictly required) to explore some of the puzzle regions there first. This is because it wants to teach you about the game world one step at a time. However, once you advance past a certain point it becomes clear that the town area is the part you'll keep coming back to.

Here's an (official) labeled map of the different areas in the island. Note that the town is labeled as "hub":

Seeing it yet? OK I'll admit perhaps I'm stretching a little here, but... it's Disneyland, right? Not literally of course, but I can't help notice the design similarities. Compare against this historical map of the park:

It's the same design. Disneyland has a monorail around the perimeter; the Witness has a boat. Disneyland has a town hub and castle in the center, so does the Witness. Space Mountain became the mountain, and sits in the same corner of the island as its real-life counterpart. And like the real-life one, it's hollow on the inside. A visitor wanders around the park's attractions, but always keeps coming back to Main Street where the shops are.

In The Witness, if you take the boat all the way around the island, you'll eventually be taken on a little ride through these sunken shipwreck ruins. Now admittedly there's no little people singing "It's A Small World" (thank god), but I definitely got the same vibes of going on a little boat ride through an indoor exhibit. It's even in the same top-right area of the map.

The island of The Witness definitely isn't Disneyland, and only slim elements match up, but in terms of design and layout there's this nagging feeling of subconscious echoes.

Zelda, Phillips CD-i, 1993

OK, maybe it's just my imagination, I dunno. But I'm not the only one to notice common themes between The Witness's island and other designed experiences. @tydaspy pointed out the similarities in style with the world map from the 1993 Zelda game "The Wand Of Gamelon" released on the Phillips CD-i.

I've been trying to put my finger on exactly why we see these similarities. I don't think it's intentional. I'm not suggesting that Jon Blow started from Disneyland as his base to work from. Instead, I believe we are seeing an example of convergent evolution.

These games (and Disneyland) have some of the same design constraints. They want a world with as many different things in as possible. They want the smallest space, sometimes because of limited real-estate or limited construction resources, but also to minimize the amount of walking the guest has to do.

More similarities come up as a result of these constraints. You have a small area, and you need to devote it all to guest entertainment. But you still need places for maintenance and behind-the-scenes work. During the construction of Disney World in Florida, they opted to hide all of their maintenance underneath the park. The entire park is built on top of a subterranean layer of tunnels, secretly accessible at the surface via what's known as "Utilidors". You don't have to play The Witness for very long to find that there's some hidden underground secrets there too; little glimpses through holes in the ground, or down a well, provide a peek into the underground system that sits beneath the puzzles.

This leads me to conclude that there must exist some force that drives some video games towards these same design choices:

Theorem

Any world that tries to pack the most content into the smallest space will eventually become Disneyland.

It seems like the act of trying to fit as many varied elements into a small space can unconsciously push designs towards this same layout, with an entranceway, hub, dominating features (castle/mountain), and a ring around the edge.

Video games, like Disneyland, aren't built to be real. The castles aren't real castles, the shipwreck isn't a real shipwreck. They're follys; little pieces of architecture to paint a picture without having a particular purpose. This is why we get the same vibes from games like The Witness and experiences like Disneyland. The feeling that the whole thing is designed around us instead of being designed for it's own purposes.

I'll also mention that the Witness also has a hotel for visitors to the island to relax in after their hard day's adventuring. A hotel that, like the real Disneyland, sits outside the park. It's a meta-element, a framing device -- not one that forms part of the main attraction, but one that's necessary for the attraction to function.

Jon Blow has mentioned on several occasions his goal of "no-filler" for his games -- the desire to pack the maximum amount of content into a small package. Does the pursuit of maximizing value in a small area lead to subconscious parallel design decisions for small environments?

I guess I just find it interesting that people tackling similar problems arrive at similar solutions without realizing. Maybe it really is a small world after all.

Learning Via Bullshit

Tue, 06 Dec 2016 08:00:00 -0000

There's two ways to learn about something. One is to go in through the front door; you read the tutorial, you follow the instructions, and you progress forward through it.

But the trouble with that approach is that you'll only learn what they want you to learn. Everyone has an agenda, and what they directly tell you about their product only reflects that agenda. If you want to really understand the strengths and tradeoffs of a system, you need to push past that and approach from a different point of view.

I saw a fantastic talk by Vyacheslav Egorov about LuaJIT and dynamic languages. I'll quote a big old chunk of it here because I think it deserves quoting.

"What I learned from LuaJIT", Vyacheslav Egorov

Some people, after seeing this kind of compiled code, they will ask, "how does it do it?", and they will try to go and learn this by reading the source.

And I will tell you that I'm not the kind of person who can learn things by reading the source from the beginning to the end, mostly because it's very hard to find a beginning and end in a pile of C code.

So instead, I do strange things to the source, like here for example I add a key to the table which... I say p[1]=1 and I just thread it through the whole loop iteration. Then I look at what kind of code the LuaJIT generates, and discover that suddenly there is a whole pile of assembly coming out. There is a table allocation here and some GC steps, and whatnot.

So I like to ask "why does it not do something?", and learning by fixing bugs, or at least trying to understand why something doesn't work. And I think this is the much better learning technique.

Learning why things don't work is often the most valuable way to really learn the tradeoffs involved in a system. The trouble with many modern systems though, is that they pretend there are no tradeoffs, and that there are no flaws. Every system today pretends to be perfect.

The late, great Douglas Adams had this very pertinent quote on the matter:

Douglas Adams (from Mostly Harmless)

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair.

That's the trouble with a lot of systems. They pretend they're perfect, when they're not. Any engineering project contains both intentional tradeoffs, and unintentional fuckups. I've found over the years that by understanding the dark corners of things, you sometimes gain a greater understanding of the whole thing.

Don't look at what they are telling you, look at what they're not. If someone's trying to promote, oh I don't know, let's say JavaScript, and they say "look here at our benchmark, look how it can rival C in this test" -- don't listen to that. Don't listen to what they do say, listen to what they don't. Don't look at the best benchmarks, look at the worst. Find when something performs the most poorly, and use that to understand it.

Every large software system contains in equal proportions hubris, propaganda, and bullshit. And this is what we really need to study to understand the system. We need to get our hands dirty and poke around in the bullshit, as that's the only place where the truth isn't being hidden.

In Vyacheslav's case above of dynamic language performance, the worst of LuaJIT still holds up well against the best of most other JITs. It can even almost compete with C in some tight loops, but only in those tight loops.

Vyacheslav's example, in case you didn't watch it, shows how LuaJIT can get amazing code generation in one case, but then you add just one extra variable, and suddenly the whole JIT falls apart.

This case is a good example of unexpected consequences - do you really want the performance of your code to change by such a drastic amount just because of adding one extra field? C tends not to have that performance cliff-drop; the performance is directly related to your understanding of the system, which is why it's remained such a popular language for all these years.

Many modern systems don't have that property. Instead, the performance of your typical dynamic language is either a) slow always, or worse: b) slow sometimes. And the sometimes is the killer. The performance of a typical dynamic language is tied directly to Foo. Wait, what? What the hell is Foo? That's just it. It's an unknown. It's something you, as the user, can't predict. You can't design your software around it, or rely on it for critical operation, because it's an unknown that the JIT implementors try to pretend doesn't exist.

If you want to learn C, you might learn more by writing to element 201 of a 200-element array than by reading a thousand tutorials. Many C programmers remember that first time when they didn't initialize a variable and their program failed as a result. But they learned a lot about the language, and the machine, as a consequence of that.

The bugs reveal the truth of what the software is. One of the good things to be said about C++ is that many (not all!) of the bugs happen once when you try to compile something, not later when you run the code. If you're going to have unpredictability, it's better that the unpredictable parts happen to you while you're designing the software, not to some other guy at home using it.

It's fine for systems to have weaknesses, as long as people are honest and up-front about it, because that's how you can understand what a system is really doing. C doesn't pretend to be perfect, it's shitty and makes you take care about fiddly details a lot of the time. But it never hides these things from you. I see so many systems being developed that unfortunately take the approach Douglas Adams lamented -- pretend your system has no faults, and prevent the user from ever being able to understand those faults.

Beating The Compiler

Mon, 28 Nov 2016 08:00:00 -0000

An oft-repeated fact on programming forums these days is that a decent optimizing compiler will always beat a puny human's attempt at hand-written assembler. There are rare cases, like MPEG decoders, where making good use of the SIMD instructions can allow assembly to massively beat the compiler. But generally you'll hear that the compiler will always do better.

The reason given for this is usually that a modern CPU has so many pipelines and instruction hazards to deal with, and a naive assembly routine won't do as good a job dealing with them.

But is it true? Let's not just take some guys word on the Internet as gospel, let's do a little experiment and find out.

We'll just take a simple piece of code to look at here. I'm not going to pick an example that would benefit heavily from exotic intrinsics. Instead I'll just use an old standard, quicksort.

Here's the naive C++ quicksort we'll be testing against:

struct Item { int key; int value; };

extern "C" 
void sortRoutine(Item *items, int count)
{
    if (count <= 1)
        return; // already sorted if only zero/one element

    // Pick the pivot.
    Item pivot = items[count-1];
    int low = 0;
    for (int pos=0;pos<count-1;pos++)
    {
        if (items[pos].key <= pivot.key) {
            // swap elements
            Item tmp = items[pos];
            items[pos] = items[low];
            items[low] = tmp;

            low++;
        }
    }

    items[count-1] = items[low];
    items[low] = pivot;

    sortRoutine(items, low);
    sortRoutine(items+low+1, count-1-low);
}

It's nothing fancy. It's not the best sorting algorithm in the world, and this isn't even the best implementation of it, but it's simple and it'll do just fine.

Now let's try a direct naive assembly implementation of it:

sortRoutine:
    ; rcx = items
    ; rdx = count
    sub rdx, 1
    jbe done

    ; Pick the pivot.
    mov r8, [rcx+rdx*8] ; r8 = pivot data
    xor r9, r9          ; r9 = low
    xor r10, r10        ; r10 = pos
partition:
    cmp [rcx+r10*8], r8d
    jg noswap

    ; swap elements
    mov rax, [rcx+r10*8]
    mov r11, [rcx+r9*8]
    mov [rcx+r9*8], rax
    mov [rcx+r10*8], r11
    inc r9

noswap:
    inc r10
    cmp r10, rdx
    jb partition

    ; move pivot into place
    mov rax, [rcx+r9*8]
    mov [rcx+rdx*8], rax
    mov [rcx+r9*8], r8

    ; recurse
    sub rdx, r9
    lea rax, [rcx+r9*8+8]
    push rax
    push rdx
    mov rdx, r9
    call sortRoutine
    pop rdx
    pop rcx
    test rdx, rdx
    jnz sortRoutine

done:
    ret

It was quite easy to write this, largely due to Intel's flexible memory addressing operators. What's interesting is that I made no real attempt to pay attention to scheduling, pipelining, etc. I just wrote it out as simple assembly.

Now let's time it. I wrote a simple test harness that sorts a 1000000-item array. I run the test 100 times and take the best-case across the whole set. I'll compile the C++ version with gcc 4.8.1, clang 3.8.0, and MSVC 2013.

Results

sort_cpp_recurse_gcc.exe      : Took 99ms best-case across 100 runs
sort_cpp_recurse_clang.exe    : Took 99ms best-case across 100 runs
sort_cpp_recurse_ms.exe       : Took 98ms best-case across 100 runs
sort_asm_recurse.exe          : Took 92ms best-case across 100 runs

Well that's interesting. The different compilers did mostly around the same, with MSVC having a slight edge. But the assembly version ran fastest, being a good 7% faster in this case.

Now the thing about C++ is that it's not always a good representation of the underlying machine. It's fine when it comes to variables, but it's representation of the stack is very limited. C++ pretends we can only use the stack as a call stack, whereas in reality one of the things we can do is to use it as a data stack instead.

So let's try that and see what happens. We'll remove the recursive calls to sortRoutine, and instead push/pop our data ranges directly from the stack. This allows us to run in a single loop without having to actually recurse. This can often have a strong benefit because it removes the overhead of entering/exiting the function each time.

I won't post the code for it here, but it's in the zipfile below if you want to see it.

sort_cpp_recurse_gcc.exe      : Took 99ms best-case across 100 runs
sort_cpp_recurse_clang.exe    : Took 99ms best-case across 100 runs
sort_cpp_recurse_ms.exe       : Took 98ms best-case across 100 runs
sort_asm_recurse.exe          : Took 92ms best-case across 100 runs
sort_cpp_iter_gcc.exe         : Took 106ms best-case across 100 runs
sort_cpp_iter_clang.exe       : Took 97ms best-case across 100 runs
sort_cpp_iter_ms.exe          : Took 95ms best-case across 100 runs
sort_asm_iter.exe             : Took 92ms best-case across 100 runs

Hmm. The assembly version comes out almost the same. I suspect this is because while an iterative approach removes the function-setup overhead, our hand-written x64 version doesn't actually have much function-setup overhead, so doesn't really benefit.

But for the C++ versions, it's a different story. Most of them got a slight speedup, but gcc is actually slower! As far as I can tell from looking at the disassembly, it seems like it's managed to out-clever itself. The increased control paths and things for it to consider have given it too many balls to keep in the air at once.

I compiled these tests on x64, where the default calling convention is fastcall. I suspect that if compiled for 32-bit instead, using the cdecl stack-based convention, the non-recursive version would have done better relatively. I didn't try it though, I'll leave that as an exercise to the reader.

Conclusion

So it seems like the old "modern compilers are always faster than you" spiel is not necessarily true. What is true however, is that the compiler did a good enough job, and the code is easier to work with. So while you might be able to squeeze a little more speed out, it's probably not worth the maintenance hassle.

The assembly version was faster though. I suppose if there's anything to be learned here, it's that people on the Internet may sometimes be full of shit.

Download

sorttest.zip - All the code used for this article.

The Illusion Of Controls

Sat, 15 Oct 2016 07:00:00 -0000

Some things are just plain hard to use. There's a lot of things you have to learn before you can use it effectively, there's big manuals you have to read, there's gotchas you need to be aware of first.

I can't use this thing, people will say. I just want to do X, why do I have to mess around with all these details which are getting in the way? Why can't it be simple?

It's a common enough complaint, one you'll hear directed against anything from the Linux command-line through to the user interface of Dwarf Fortress.

But for every complicated interface, there's always someone who'll reply with the common phrase:

"X needs to be that complicated. Y is fine for noobs, but with X you get so much more control."

And herein lies the problem. The illusion of controls. It's not often that one letter can make such a difference, but what we're talking about here is control vs controls.

Whenever a problem is presented, all good programmers know to head directly for a good car analogy. It's a tried-and-tested method we've used for years which I'm almost certainly sure won't backfire on me terribly. Nope. Here we go!

I'll quote from these instructions on how to drive a Model-T Ford, an experience once described by Top Gear's Jeremy Clarkson as "the hardest thing in the world", like trying to pat your head and rub your belly at the same time.

How To Drive a Model-T Ford

There are three pedals on the floor marked from left to right when sitting in the driver's seat: C (clutch), R (reverse) and B (break). There are two levers on the steering column, spark advance and throttle, and one floor lever to the left of the driver. The floor lever is neutral while in the upright position, second gear when in the forward position while the leftmost pedal (C) is not depressed, and emergency brake when all the way back.

All speeds are controlled by a foot pedal enabling the driver to stop, start, change speeds, or reverse the car without removing the hands from the steering wheel. The foot pedal at the right, operates the brake on the transmission. The pedal in the center, operates the reverse. The left foot pedal, is the control lever acting on the clutch.

The hand lever when thrown forward engages high speed; when pulled back, operates the emergency brake. The lever is in neutral when almost vertical and clutch is in a released condition. With the hand lever thrown forward in high speed, a light pressure on pedal 'C' releases the clutch while a full pressure on the pedal throws into slow speed; by gradually releasing the pedal, it will come back through neutral into high speed.

There's more -- you've also got the "spark advance" lever (no idea), a carburetor adjustment, and an additional throttle lever.

On the other end of the spectrum is something like a bicycle. On a bicycle you have pedals for your speed, and handlebars to steer. That's basically it, there's a brake lever and maybe a gear selector. It's an incredibly simple machine though, one which you can learn without instruction very quickly.

Part of the simplicity of the bicycle is that you can see how it works. It's immediately obvious how your action on the pedals affects the ground speed. You can directly see the relationship between the brake line and the wheel.

The difference is a bicycle has control. The Model-T has controls. With a bicycle every part of your body is in direct connection with the movement. You can lean into corners, shift your weight around. You can feel the brakes and whether you're applying the right pressure.

The Model-T takes the approach of having controls. A complex beast of a machine that requires memorization, and the knowledge of what each control does and how it operates the beast. Modern cars try to simplify that as much as possible, and cars like the Tesla go to an extreme, having not even a clutch or gearbox. Perhaps it's not the best parallel, and I'm sure some reader will call me out on my terrible automotive knowledge. But the point remains valid, and programmers would do well to heed it.

An Etch-a-Sketch has more controls than a paintbrush, but has no control. Give an artist a paintbrush and it becomes an extension of him, a tool he can use to apply different pressure, to feel the paint as he moves it around a canvas. An operating system like Linux gives you a lot of controls -- a hundred different config files to adjust, fontconfigs to manage, but it gives you no control. We spend so long caught in a Byzantine complex of configurations and dotfiles that we fool ourselves into thinking that the controls were the reason we originally came here.

And for the record, I don't feel the solution is to just hide the complexity like so many programs do (I'm looking at you, Apple). That's like putting the controls inside a sealed box and hoping you won't try and adjust them. We need more systems like the Tesla and its all-electric drivetrain, where the need for controls dissolves away.

The Challenge Of Making Things

Tue, 04 Oct 2016 07:00:00 -0000

What would you do if you could do anything? If you were all-powerful, and could create anything just by waving your hands around? The answer is, nothing.

You could create anything you want right now. Just grab a pen and paper and draw a new universe. You could have drawn 5 different scenes today, but you didn't. It takes a special kind of person to push past the barrier of the empty page, a talent I and a large majority have trouble with.

An unrestricted blank canvas is the worst possible killer of creativity. People build models of Game Of Thrones worlds in Minecraft. Why? They could build it in Autodesk Maya, or Sketchup, and it'd look better (and might even be easier for a large scene). The restrictions breed creativity. It's not enough to just make something, you need a frame to place it within. Artists need a frame to define the medium they work within.

Why do people build a working 6502 using Minecraft redstone? I mean if building a 6502 is the goal, why not use a better medium? Perhaps this shows us that the subject itself is not as important as the medium it's made in.

Retro games were originally developed in a constrained medium, and required great effort on the part of the designers to fit the game into that. Now we have gigahertz CPUs and GPUs, but people still make pixel-art games and text adventures. In a world where Unity provides instant access to a professional quality 3D scene editor, why would someone make games in PuzzleScript?

We need mediums. An artist with no restrictions will never make anything. Some people think retro games are a fad, but I think as technology gets broader we're going to look to the narrow canvas ever more so.

I worry about things like Dreams, which promises an all-powerful blank canvas. I think a large part of Little Big Planet's success is that you couldn't make everything. It limited you to a simplified 2D canvas of mechanisms, but people had fun trying to push those limits. If there are no limits, what can you push?

Untonemapping, and other stupid tricks

Sun, 02 Oct 2016 07:00:00 -0000

I've been meaning to write something about this for years but never got around to it. I don't claim there's any great use for this stuff; it's just one of those little oddities us graphics programmers like to collect.

You all know what tonemapping is - converting a HDR (High Dynamic Range) image into an LDR (Low Dynamic Range) image for display. What might not be immediately obvious is that it's reversible.

We can formalize this relationship using the following notation:

L(x) = 1 - exp2(-k*x)

That's the standard formula for an exponential tone mapper. There's a lot of other functions you can use (Reinhard etc) but for the purposes of today's article it doesn't matter, so let's just pick the simplest one to work with. For final display of course it may well matter, but we're not talking about final display here. The reason it doesn't matter is that it cancels out.

Note that I'll always be using exp2, not exp/log/ln etc, and you should too. GPUs often only support exp2, with the others requiring an extra multiply to convert bases. So if we work in base 2 ourselves we can save that multiply.

So if the formula above is tonemapping, what's untonemapping? Well it's just a simple inverse:

H(x) = log2(1 - x)/-k

Ok so far. But what use is it? Let's try an example.

Let's say you want to render a nice image of a wooden teapot (hey why not). So you make a beautiful Photoshop image, like so:

Our source texture.

Then you slap it on a basic lit mesh. First let's compare how it looks when you render it using an old LDR engine, then using a standard HDR engine with tonemapping:


Old-school LDR engine.	Modern HDR engine.

// LDR:
float3 diffuse = tex2D(diffuseTex, uv).rgb;
float3 color = diffuse * lighting;
return color; // no tonemap

// HDR:
float3 diffuse = tex2D(diffuseTex, uv).rgb;
float3 color = diffuse * lighting;
float k = 2.0f;
return 1 - exp2(color * -k); // exponential tonemap

Eurgh. Both of these images kinda suck. Our texture looked so nice in Photoshop, but now it's been distorted in both renderings. The LDR one preserves the vibrant orange colors well, but because it's an LDR engine it can't light the thing properly, and clips the colors badly.

However, the HDR engine has captured the full lighting range, but at the expense of draining the contrast and saturation from the texture. This does depend on which tonemapping curve you use, some fare better than others. But they all tend to exhibit this problem. Why is this?

The problem is that we're using this photo as a diffuse map, but it isn't a diffuse map. What the photo really is, is the output of another renderer. (In this case, the renderer was the real world and a camera)

This means the source photo is already tonemapped. We need to reverse the process to recover the original diffuse map. We can do this by assuming the photo was taken under some standardized lighting conditions, and simply running it through the untonemapping operator.

But, you ask, how can I untonemap it if I don't know the value of k to use? That's the cool part: it doesn't matter. Just pick one (1.0 works well). That'll be our reference exposure value. The exposure values we then use to render our scene will then be defined relative to our base.

// HDR using untonemapping to correct the diffuse texture
float3 diffuse = tex2D(diffuseTex, uv).rgb;
float k = 2.0f;
diffuse = -log2(1-diffuse); // untonemap
color = diffuse * lighting;
return 1 - exp2(color * -k); // tonemap

So how does that look now?

HDR engine using untonemapping.

Well that's a lot better. It now matches the source map exactly - the output of our renderer is identical to the artists image, and we now have a good mathematical framework for taking our output results and working on them.

Before we go any further I'm going to make one small but important tweak. There's a lot of "1-" going on here, and it's kinda annoying. Let's get rid of it. We don't need it.

L(x) = exp2(-k*x)
H(x) = log2(x)/-k

This means all our LDR images will be inverted, but we can just flip it back before display. I'll call this space inverted-LDR, and that's what I'll be using for the rest of the article.

This now means that in LDR space, black represents infinitely bright. This turns out to be surprisingly useful. In fact, it makes me wonder if that isn't in fact the natural image representation we should all use by default.

And now for my next trick

So what are the consequences of this? Well, now that we have a more rigorous definition of how to convert to/from LDR space, we can convert some common HDR operations so that they work in LDR space directly.

So, for instance. In HDR space, if you want to add two colors together, you just add them. Let's write that out:

Ah(x, y) = x+y

We can get the LDR equivalent by tone mapping it:

Al(x, y) = L(Ah(x, y))

Expanding that out:

Al(x, y) = L(x+y)
Al(x, y) = exp2(-k*(x+y))
= exp2(-k*x + -k*y)

Now here's the trick. We can use the laws of logarithms to split that apart:

= exp2(-k*x) * exp2(-k*y)

Do you see what's happened here? It's equivalent to tone mapping the two colors individually, then multiplying them. Just to spell that out for you:

Given two inverted-LDR images, you add them together by just multiplying them.

ADDinv(x, y) = x * y

What would happen if we were using regular-LDR instead of inverted-LDR? Let's write it out with the 1-x's in:

ADDreg(x, y) = 1-((1-x)*(1-y))

Oh look, that's the Photoshop 'screen' blend mode. I don't know if that's something the Photoshop designers intentionally thought of; if not, it's certainly an interesting co-incidence.

But wait! There's more!

That's addition taken care of. What about multiply?

In HDR-space, a multiply looks like this:

Mh(x, y) = x * y

We can do the same tricks as before. First let's tonemap it to get it into LDR.

Ml(x, y) = L(Mh(x, y))
Ml(x, y) = L(x*y)
Ml(x, y) = exp2(-k*x*y)

And then apply logarithms to split apart again:

Ml(x, y) = exp2(-k*x)^y

What does this mean? It means that if you have an inverted-LDR image, and a HDR image, you can multiply those together by raising the LDR image to the power of the HDR one.

e.g.

MULinv(xl, yh) = xl^yh

An Example

Here's an example of how you might throw all this together. Let's imagine you're starting with an inverted-LDR diffuse texture, and you want to do some HDR lighting with it. We can use the "multiply" rule to do the diffuse lighting, then the "add" rule to add on the specular lighting. Note that the diffuse texture remains in inverted-LDR space throughout, and the final result needs no tonemapping, because it is already in LDR space.

float3 diffuse = tex2D(diffuseTex, uv).rgb;
float3 diff_lighting = calculateHdrDiffuseLighting();
float3 spec_lighting = calculateHdrSpecularLighting();
float3 ldr_output = pow(diffuse, diff_lighting) * tonemap(spec_lighting);

I'll summarize the inverted-LDR-space rules in a table here:

Rule	Formula
HDR to LDR	exp2(-k*x)
LDR to HDR	log2(x)/-k
HDR + HDR	x*y
LDR * HDR	x^y

So there it is. As I said, I don't know if this is going to be especially useful to anyone, but I thought it was interesting how you can do mathematics in LDR-space and yet get the correct results of HDR lighting.

Debunking Euclideon's Unlimited Detail Tech

Tue, 13 Sep 2016 07:00:00 -0000

Uh oh. They're at it again. Yes folks, Euclideon are back with more of their smarmy-voice-over-without-any-detail brand of hype. They call it "Unlimited Detail", but what they don't do is explain how any of it actually works.

If only there were some way we could find out how their idea works. If only... wait! There is!

One of the great things about ideas is that you have two choices; you can either keep it a secret, but then you risk someone else coming up with it too. Or, you can patent it, which grants ownership of the idea. Of course, in order to be granted a patent, you need to actually explain what your idea is and how it works.

With that in mind, it's easy to actually find out how the Euclideon tech works. Off we go to the Australia Patent Office! A quick search for Euclideon reveals a number of documents, but there's one 2012390266, "A computer graphics method for rendering three dimensional scenes" that seems to be the one we need.

It's not an especially exciting read, most patents aren't. I'll summarize the description here:

The scene is stored as a number of objects.
Most objects are rendered using the fast orthographic method.
Objects up close are rendered using the slow perspective method.

Oh look, it's just voxels in an octree.

But what is the orthographic method, you ask? Well it turns out not to be that complex. Here it is folks. Prepare yourself for the wonder of the Unlimited Detail Engine:

You store colors in octree cells.
You walk recursively over this octree and splat each point on screen.

Wait, is that it? Yes my friends, this is the same algorithm described in the 1985 paper "Back to-Front Display of Voxel Based Objects", by Frieder et al. I think Euclideon choose to go front-to-back instead, and use a mask to avoid overdraw, but it's the same thing. They're taking 30 year old technology and passing it off as being next-gen.

What this means is that their data is stored in a pre-built octree. Despite their recent claims, there's no way this can animate like modern games need. The only way you can animate it is if you use stop motion - i.e. have several pre-built octrees and switch between them. And looking at their recent footage, I think that's what they're doing. It all looks kinda... well... jerky?

We can do a little math and run the numbers here. Let's imagine you've got a 3GHz CPU, and you want to render 1000x1000 at 60FPS. That's a million pixels you need to fill in. 3GHz/60=50,000,000 cycles available per frame. Therefore you need to render one pixel in 50 cycles. That's pretty tight. It might be do-able, but then you've just used all of your CPU budget doing it. What about the rest? Do you want anti-aliasing? Lighting? Shadows? Bloom? Depth of field? Well tough, because you've already pegged your CPU out at 100% just filling in the color buffer.

Extreme draw distance in The Vanishing Of Ethan Carter

I'm not saying voxel-based games can't work, I think there's definitely a place for less polygony techniques in future. But this isn't it. The trouble with Euclideon is that they spend such a large amount of their time trying to explain that their tech is better than current existing games, when the simple fact is that it isn't. In their latest video they moan about LOD pops in games. I suggest they go take a look at some actual games. I just finished playing through the delightful "Vanishing Of Ethan Carter", and guess what? No LOD pops anywhere in the game. It draws trees off to the horizon and they all just magically morph to lower-detail versions without you ever noticing.

You might be impressed by their up-close dirt rendering, but it's no match for the current round of games and GPUs. Take a look at "Star Wars: Battlefront" here - that's what we're doing right now using just regular GPUs. It's already light years ahead of their tech. Advances like geometry tessellation have taken polygon rendering to new extremes.

Close-up detail in Star Wars: Battlefront

Visible voxel artifacts in a so-called Unlimited Detail engine.

This tech, at least the way they're doing it, is dead. They have no real lighting, none. Just look at their images - it's just N-dot-L, which has been prebaked. I spotted a shadow underneath one of the fences, but oh look, it casts directly downwards. Do you know why? It's because if it cast at an angle, it would spill over onto adjacent objects and prevent re-use of instances. You could argue that they could prebake very nice GI lighting, but they can't; the only way they can get their "unlimited detail" is by instancing the same objects several times.

If you want to see some real exciting advances in point-based technology, go look at the upcoming game Dreams by Media Molecule. Those guys are way ahead of Euclideon, and guess what? Their stuff doesn't rely on pre-baked hierarchies, it's all genuinely real-time.

tl;dr -- GPUs get better every year. If you want unlimited detail, just go buy a PS4 today. But please, don't give these hacks any money.

Learning To Wrangle Half-Floats

Sat, 10 Sep 2016 07:00:00 -0000

You all know what floating-point arithmetic is, so I won't bore you by covering that. The IEEE standard originally defined two variants, the 32-bit single-precision format and the 64-bit double-precision format.

But that's not all you can do. If you understand the principles behind it, you can make your own floating-point format at any precision. The most popular small-float format is the 16-bit half-precision format. Popularized by Nvidia and ILM, this is supported in hardware by most GPUs.

The half-float format is great because it's good enough for many cases, while only being half the space of the standard 32-bit format. It's not just the space either -- the PS3 GPU, for example, would often run twice as fast when using halfs. (Interestingly enough, this usually wasn't due to the precision difference, but to restrictions on register file access. The smaller data access allowed the compiler to better schedule the instructions.)

There's a downside to this flexibility though. With regular FP, you can usually just throw it in there and not have to worry about precision. That's no longer true for half-floats. Every time you use them you now have to worry about whether it's suitable for the current case. And, as you may discover here, the results can be surprising.

The format is very simple; it's basically the same as the 32-bit version but with less bits:

Sign	Exponent	Mantissa
1 bit	5 bits	10 bits

The Wikipedia page has a good detailed explanation of it, but the trouble with just running the numbers on it is that you don't really get a good feel for understanding it. We need a better way of grasping the fundamentals.

To get a good visualization of half-float, the most useful property we can use is the fact that they're only 16-bit. This means that there's only 65536 of them. You know, that's not actually that many. So, why not just list them all out? I did just that. That's the great thing about computers today, data doesn't seem as big as it used to be. Once you have all the data in front of you at once, it's much easier to get a grip on it.

Download

halfs.zip (406KB) - A list of every single half-float.

This text file has come in very useful for me on several occasions, and I'd recommend keeping a copy of it around for any time you're doing graphics work. Let's see what we can discover from this. We'll start by making a simple table, showing off the ranges at which the precision changes:

Exponent	Starts at	Step between each number
0	0	1/16777216
1	1/16384	1/16777216
2	1/8192	1/8388608
3	1/4096	1/4194304
4	1/2048	1/2097152
5	1/1024	1/1048576
6	1/512	1/524288
7	1/256	1/262144
8	1/128	1/131072
9	1/64	1/65536
10	1/32	1/32768
11	1/16	1/16384
12	1/8	1/8192
13	1/4	1/4096
14	1/2	1/2048
15	1	1/1024
16	2	1/512
17	4	1/256
18	8	1/128
19	16	1/64
20	32	1/32
21	64	1/16
22	128	1/8
23	256	1/4
24	512	1/2
25	1024	1
26	2048	2
27	4096	4
28	8192	8
29	16384	16
30	32768	32
31	infinity/nans

There's quite a few surprises nestled away in here. Perhaps the most shocking is the extreme precision loss at the high end. After 32768.0, you're stepping over 32 integers at a time! Even as low as 1024.0, you're still stepping 1.0 each time. Just to ram that point home, numbers higher than 1024 lose all fractional precision.

The maximum half-float possible is only 65504. That's not very big for many applications. And even at that range, you're only accurate to +/- 32.

Thinking of storing UV co-ordinates at half precision? Think again. At the 1.0 range our halfs are only accurate to 1/1024. For a 4096x4096 texture that means they're only accurate to every 4 pixels.

Trying to store a displacement map at half-precision? If it's in the 0-1 range, you're effectively only getting the same accuracy as a 10-bit format. That might be OK for a simple effect, but don't try it for a heightfield.

To summarize, while half-floats are great and you should use them whenever possible, you have to check your range first. How much precision do you require? It's easy to assume that a floating-point format will just magically give you everything you need, but it's not always so. Once you get outside the 0-1 range, half-floats lose their appeal for many cases.

The Metaprogrammer

Tue, 06 Sep 2016 07:00:00 -0000

There are some topics which, if posted onto a forum or news site, cause programmers to spew out more blather than all the rest put together. Such topics include:

What kind of office chair you should have.
The benefits of a closeable office door over an open-plan office.
The brand of keyboard you use to type with.
The configuration or quantity of your monitor(s).
Standing desks.
How long it takes you to resume work after an interruption.

You post an article about a new programming language, you'll get 10 replies. But start a discussion on what headphones you wear while you work, and ten-thousand people will rise up from nowhere, pushing the thread ever-skywards on a tower of upvotes. These things are not programming, but you can bet your bottom dollar that they'll get the most programmer attention every time they come up. Therefore we can suppose that this must be metaprogramming. You might have thought that metaprogramming meant macros, type introspection, that kind of thing. Nope, I'm hijacking the word for today:

metaprogramming: (verb.) The act of talking about programming, rather than doing any actual programming.

Perhaps like the screenwriter who can only write if they are seen to be writing whilst in Starbucks, the metaprogrammer is concerned more with the appearance of work than the work itself. It's a kind of Schrödinger's Programmer - only by observing the programmer can we collapse the wave function. If the programmer is not seen, does he really even exist? Does an unwatched programmer begin to fade away like the Cheshire Cat, until all that remains is the fedora?

A coder I know (whom I won't name) once said that he would only ever consider someone to be a good programmer if they had a twitch livestream about programming. As if a programmer who isn't visibly metaprogramming on live TV can't even be considered to be competent. They sure have some exciting ideas for streaming entertainment nowadays: "Come see the wonderous optimizing! Watch live as he waits for Visual Studio to load!"

The metaprogrammer needs to be pampered. If they don't have the shiniest Apple MacBook then they can't work. Never mind that their job involves typing letters into a text file, something you could have done on CP/M back in '78. Gotta have that 40" monitor, it's essential, can't work without it. To insult a new hire by providing only a single 4:3 monitor? I can't work like this. This keyboard doesn't even have any OLED keycaps on it. This is an outrage.

Maybe it's just an insecurity, and they need these things to feel better about the work. Much like a comfort blanket perhaps, or a little desk toy they keep on their workstation. Can't program without it, it helps me think. There's another one -- "workstation". No, I couldn't possibly use something as pedestrian as a "computer", I need a workstation dammit. With chrome plating and fuel injection.

If you want to run a startup today, you gotta have a cafe. That's the perk people want above all else. Ask people why they want to work at Google, they won't talk of their desires to work on world-changing projects, or the opportunity to apply cutting-edge tech. No, they'll say "because of the free food!". The perks, my friend. The perks are everything. No-one cares about what you do there, but whether you get a free massage, or bagels supplied every morning like manna from heaven.

I've seen people almost come to blows over who gets the Aeron chair. I've seen artists who demand the luxurious corner office, with the luxurious view, and then put paper over the windows to stop the light coming in. I've seen companies buy MacBook laptops for every employee, even though they're never taken away from the desks.

Some of the best programmers I've ever worked with don't have twitter accounts. It's almost unthinkable, isn't it? How can they possibly be one of the top programmers in the world, building the most successful projects out there, without being seen to be doing it? You can write beautiful code, the best code in the world, but it doesn't mean a damn today if you didn't blog about your new standing desk.

When did the messenger become more important than the message?

The Multi-Project Programmer

Fri, 26 Aug 2016 07:00:00 -0000

Every now and again I see this odd little pattern pop up. You'll be using some software, some big-name famous thing perhaps, and you'll happen across an article on-line where you discover it was written by just one guy. That's not so unexpected perhaps; every project probably started as one man's idea. The odd part is when you delve a little deeper into the history of this guy, only to find out he also wrote this other piece of software you use.

I see it happen time and time again. Look at Ludvig Strigeus. Maybe you'll have heard of him as the guy who wrote ScummVM along with Vincent Hamm back in 2001. But did you also know he then went on to make OpenTTD (a clone of Transport Tycoon) a couple of years later? That would have been enough for some people, but no, he still had a pressing need inside him to write a program called uTorrent. To finish off he went away and started work on a little music thingy called Spotify.

Or Fabrice Bellard for example. The man's a menace, he should be locked up. He's managed to litter a trail of big-name projects behind him -- FFMPEG, which powers probably half of the video players in the world. QEMU, one of the most famous machine emulators out there. Or tcc, the tiny C compiler which can boot Linux from the source code in 15 seconds. Oh and he wrote an entire Javascript PC/Linux emulator which runs in a browser. We need to stop him before he creates SkyNet or something.

There's Steve Streeting, the creator of the Ogre3D scene-graph library, but also the creator of SourceTree (the hg/git client). Or Justin Frankel, not just the main author of Winamp but also the Gnutella file-sharing program, who then also went on to help create the REAPER audio workstation. Many older readers might remember Dan Silva, the guy who wrote Deluxe Paint. But did you also know he then went on to put together some seriously big parts of Autodesk's 3D Studio?

I happened to be browsing the list of Hugo sci-fi award nominees when one name stuck out at me - the 1975 'best short story' nomination for a guy called P.J. Plauger. Wait, that P.J. Plauger? Yep, turns out that when he's not writing C++ runtime libraries he spends his time winning the John W. Campbell Award for Best New Writer.

I'm not even going to mention Elon Musk.

You should take some of this with a pinch of salt. I don't want to imply these guys did all the work on their projects -- FFMPEG has loads of contributors. ScummVM is the work of many people. But each of them started from a little seed, a seed which was planted by one person.

So what is it that these guys have that apparently no-one else does? Maybe they're just geniuses, but I don't think that's it. I think maybe they just a have a good eye for quality. They have strong ideas about the kind of things they want to exist and they're not afraid to get on with it.

There's a saying which I'm too lazy to look up but goes something like "don't invest in companies, invest in people." Software is one of the few areas where one guy on his own can have an idea and see it through from start to finish. The days of the lone inventor in his garden shed are long gone, but the spirit remains; it's just changed mediums.

I don't really know what I'm getting at with all this. I just find it interesting that there are people out there who aren't limited to one thing. There's supposed to be 7 billion people in the world, yet I find the same names cropping up again and again when you least expect them to. Maybe there are only 500 real people in the world and all the rest are NPCs, who knows.

I suspect what I'm experiencing is cargo cult development - that there's some underlying process I'm not understanding, and I'm just the guy studying the symptoms with the hope of figuring out the cause. Perhaps I'll never know. Still, there are people out there managing a string of unexpected hits, with no signs of stopping. Best of luck to 'em, I say.

The Elegance of Deflate

Sun, 21 Aug 2016 07:00:00 -0000

A little while back I found need of a PNG loader for a small project of mine. Being a complete tool I of course decided to write my own -- after all, why save yourself effort when there are still wheels waiting to be reinvented? You can check the source to the inflater code here if you so like - it's quite cleanly written. In fact I wrote it specifically to be easy to read, rather than to be the fastest implementation.

I didn't know much about Deflate at the time. I knew it was based on the LZ family of algorithms (LZ77/L7SS etc). I'd heard anecdotally that it was "just" LZSS except they applied Huffman coding on the match vectors. Well, it turns out that's kinda true. Kinda. But as I read deeper into the specification, it stuck me that something really quite clever was going on here, something I hadn't seen anyone explicitly call out before. So I figured why not try and explain the essence of Deflate here. I'm not going to cover the whole workings in exacting detail; if you want that go read the spec.

Deflate was invented by Phil Katz back in 1993, and forms the basis of the ZIP file format, the zlib library, gzip, PNGs, and probably a whole bunch of other stuff. At the time it was pretty cutting-edge. The main competition back then was usually LZW, or maybe LZSS. In many people's eyes compression was still synonymous with run-length encoding, so when Deflate came along it definitely turned some heads. Now this was over 20 years ago, and so later codecs like LZMA (which I may one day write about) can regularly beat it by a healthy margin. Still, it ain't dead yet.

Errata

So the basic principle described here was also used by the LHA archiver, which predates PKZIP 2.0 by a little. Perhaps I should arguably be titling this article "The Elegance of LHA"? Well I'm not going to, so there. History is written by the victors, and the guys with the better acronyms.

Compression

There's two basic approaches to compression. OK no, that's not true, but let's go with it anyway and press onwards. To be honest if you know anything about compression already you may well be glossing over much of this, but here it is regardless.

Entropy based compression
Dictionary based compression

Entropy based compression is a very old idea. Computers usually store letters as 8-bit; one byte for each letter. "What a waste," someone thought. Some letters like E and T are very common, while letters like Q and Z rarely come up at all. It makes sense that you'd want to use less bits for the letters that are more common.

Frequency counts of the Declaration Of Independence. Lower-case dominates, with capitals almost absent.

This idea even predates computers. Go look at Morse code -- the two most common letters in English (E and T) are just a single dot and dash.

Huffman coding is a very old and simple way to apply this. You start with an "alphabet" - the set of all the symbols you want to encode. In English at school we're taught that our alphabet is 26 letters. In computers however we need a little more. We need to store upper-case letters, lower-case letters, numbers, punctuation, we need a symbol for a 'space', and a few other little things. Once you have your full alphabet you need to assign probabilities. We know some letters (e/t/a/i/etc) are more common than others (q/x/y/z/etc), but we also have to take the rest of this expanded alphabet into account. Numbers are rare in English prose, and so are capitals. This paragraph has five-hundred and sixty-one lower-case letters in, but only ten capitals.

Once you have your probabilities all sorted out, you build a Huffman tree. This assigns a unique code to each letter, except in a way that's "self-terminating". i.e. you don't need to store any additional information to say where each letter starts and stops. If you look at the example of Morse code, it doesn't work like that. While Morse has dots and dashes, it also has gaps, and the gaps are needed to split the letters up. Huffman coding needs no gaps.

So that's entropy encoding -- you decide upon an alphabet, you use less bits to store the common things and more bits for the rare things. You can compress things like English language with it, and while it works fine it leaves a lot of room for improvement. The are better ways than Huffman coding of course, like arithmetic coding, but Huffman's the easiest to get your head around. It's also what Deflate uses, so we'll leave it there.

LZ77

LZ77 forms the basis of the dictionary-based algorithms. The idea of dictionary compression is simply that certain things come up more than once in a document. For instance, if you search this page for the word "compression" it'll appear several times. So rather than store it again each time, let's try and refer to the previous usage.

LZ77 worked very simply. You store the match vector (a pair consisting of length and distance), and following the match you store the letter (the "literal") which ended the match.

So for example, if I want to store the phrase sense and sensibility, we see that the sens part is used twice. So we might store the second usage as go back 10, copy 4, letter i, because we need to copy the 4 letters of "sens" and then stick a new i on the end.

Seems fine, right? Nope, LZ77 sucked. Although it could save space by referring to previous phrases, that was all it could do. The pattern was fixed - it stored a match, followed by a letter. But what if there was no match? To use the same example, here's how LZ77 would actually store that whole quote:

Input: sense and sensibility
Output: [back 0, copy 0]s[back 0, copy 0]e[back 0, copy 0]n[back 0, copy 0]s[back 0, copy 0]e[back 0, copy 0] [back 0, copy 0]a[back 0, copy 0]n[back 0, copy 0]d ...

What a mess. You get the idea though. It has to store a match, even if there isn't one. And storing that empty match information takes space. A typical vector might be 16 bits (12 bits for the distance back, 4 bits for the length). So something that was previously 8 bits per letter could now end up as 24 bits per letter! This is why no-one uses raw LZ77.

So they invented LZSS, which is what many of the early archivers used. LZSS stores a single bit to say whether the next thing coming up is a match vector, or a literal symbol. It's such a painfully obvious idea I'm at a loss as to why it wasn't thought of sooner. So now our extra marker bit allows us to flag whether a match vector is empty, and simply skip all that wasted overhead.

Anyway, that's dictionary compression. It works very well, when it's able to get lots of good matches going. It's that extra bit, though. That pesky "literal" bit we use to indicate whether we got a match. What happens if there aren't any matches? In that case, we've gone from originally storing 8 bits per letter to storing 9 bits! If no matches occur, LZSS will store 1 bit to indicate a literal letter, then 8 more bits to store that letter!

LZ77, LZSS and others in the family all suffer from the same problem, the overhead of having to switch modes -- when they find something that doesn't fit into the class of dictionary-based matches, they need to write out some kind of marker to switch over to the literal mode. This all takes up valuable space in the stream. (if you're not careful, sometimes you'll waste so much overhead doing this kind of switching that you won't actually compress the file at all, you'll enlarge it!)

So we have two compression algorithms. LZSS is reliant on finding previous data to match against, and Huffman coding is reliant on some letters being more common than others. Can we do better than picking one of those two? Can we weave them together?

Yes we can.

Deflate

If we want to munge these two algorithms together, it's not a great leap to imagine how we might do it. We start from LZSS right? The matching part seems OK, it's just the literals that are bogging us down as they're completely uncompressed. So how about this -- we could just Huffman encode them? We store a bit to say if it's a match or a literal, and if it's a match then we write a 16-bit match vector, and if it's a literal then we write a variable-length Huffman-encoded symbol.

That'd work just fine I suppose, but it's not how Deflate does it. Deflate goes further than that. You're still paying the cost of that marker bit to switch modes. In order to understand Deflate we need to expand our idea of what an alphabet is.

You see, when we hear the word "alphabet" we think back to what we learned as children -- A,B,C, etc. But the moment we start getting involved in compression we know that isn't the case. We have already had to expand our alphabet to include upper-case, and numbers and such. Even spaces, which aren't even visible, we have to encode those too. Deflate just takes this concept one step further.

Deflate is based around the idea of the unified alphabet. If an alphabet is just a set of choices, why not bring all the choices together under one umbrella? Deflate's alphabet consists of 286 symbols. The first 256 are the ASCII codes for each letter, including all the ASCII control codes and other such. The remaining 30 symbols are used to represent lengths. That's right, we're storing the actual match length here. Here's the actual table used:


0-255	Regular input byte
256	End of data block
257	Match of length 3 (distance follows after)
258	Match of length 4 (distance follows after)
259	Match of length 5 (distance follows after)
260	Match of length 6 (distance follows after)
261	Match of length 7 (distance follows after)
262	Match of length 8 (distance follows after)
263	Match of length 9 (distance follows after)
...	etc

There's a bit more to it than just that, which I've omitted here for clarity, but hopefully this should explain the principle. We have 286 symbols now, and each one will be assigned its own probability, and therefore its own Huffman code. We know that some letters are still going to be more common (E, T, etc). But what about the matches? How common are they?

Remember how we had the frequency diagram of English text earlier? Well here's how it looks with our expanded alphabet:

Frequency counts of the Declaration Of Independence, when Deflate compressed.

You can see that the lower-case 'e' is popular, as are some of the other lower-case ones like 'a/i/t/etc'. Capitals are almost non-existent, and there are no number digits at all in this case. Yet something sticks out like a sore thumb -- the dictionary matches which dominate the probability domain. It's also interesting that the probabilities of the lower-case letters have changed, due to the dictionary matches evening it out.

As we compress the data one symbol at a time, we face a choice at each stage of compression. If we can find a previous match then we can just write that match vector out, and if we don't then we write the literal symbol. But the special beauty of Deflate that sets it apart from its predecessors, is that whatever option we pick, we're only ever writing out a single Huffman-coded symbol.

If we want to store two matches in a row, or two literals in a row, we can do that. One alphabet to store two different ideas. No bit markers needed to designate a switch between modes. We're storing literal letters compressed according to their frequency, we're also storing match lengths according to their frequency, but more than that we're doing both of those things under one unified scheme.

It's not one algorithm but two, that dance and weave together in harmony. You can pick dictionary matching, or entropy coding, and you can select between them on a per-byte basis with no overhead. That's what I find so clever about it - not that it can do one thing or the other, but that it can choose either, and yet represent them using the same language. No extra markers, nothing to say "oh, now we're switching to entropy mode", nothing to get in the way.

It's an RLE encoder, it's a dictionary matcher, it's a frequency table, and it's all of these things with no barriers in-between, no modes to switch nor separate passes to run.

Deflate remains an excellent example of how two algorithms can come together and play off one another, rather than fighting against themselves.

I gotta admit, I'm impressed.

codersnotes.com

Why Build?

Bitcoin

What The Hell Was The Microsoft Network?

In Search Of The Lost Program

The Lost Chord

Something Rotten In The Core

Little Lightmap Tricks

Don't put gaps in!

Squish blocks down

Share identical charts

Don't ruin your block compression

Visual debugging

The scheme I used

Summary

Why Command And Vector Processors Rock

The Blitter

The Copper

Hardware as a tool

The Danger Of Opinions

Disassembling Jak & Daxter

The loader

The top-level segment

The main segment

The debug segment

Getting a disassembly

Converging Towards Disneyland

Learning Via Bullshit

Beating The Compiler

Results

Conclusion

The Illusion Of Controls

The Challenge Of Making Things

Untonemapping, and other stupid tricks

And now for my next trick

But wait! There's more!

An Example

Debunking Euclideon's Unlimited Detail Tech

Learning To Wrangle Half-Floats

The Metaprogrammer

The Multi-Project Programmer

The Elegance of Deflate

Compression

LZ77

Deflate