The Multi-Project Programmer August 26th, 2016
Every now and again I see this odd little pattern pop up. You'll be using some software, some big-name famous thing perhaps, and you'll happen across an article on-line where you discover it was written by just one guy. That's not so unexpected perhaps; every project probably started as one man's idea. The odd part is when you delve a little deeper into the history of this guy, only to find out he also wrote this other piece of software you use.
I see it happen time and time again. Look at Ludvig Strigeus. Maybe you'll have heard of him as the guy who wrote ScummVM along with Vincent Hamm back in 2001. But did you also know he then went on to make OpenTTD (a clone of Transport Tycoon) a couple of years later? That would have been enough for some people, but no, he still had a pressing need inside him to write a program called uTorrent. To finish off he went away and started work on a little music thingy called Spotify.
There's Steve Streeting, the creator of the Ogre3D scene-graph library, but also the creator of SourceTree (the hg/git client). Or Justin Frankel, not just the main author of Winamp but also the Gnutella file-sharing program, who then also went on to help create the REAPER audio workstation. Many older readers might remember Dan Silva, the guy who wrote Deluxe Paint. But did you also know he then went on to put together some seriously big parts of Autodesk's 3D Studio?
I happened to be browsing the list of Hugo sci-fi award nominees when one name stuck out at me - the 1975 'best short story' nomination for a guy called P.J. Plauger. Wait, that P.J. Plauger? Yep, turns out that when he's not writing C++ runtime libraries he spends his time winning the John W. Campbell Award for Best New Writer.
I'm not even going to mention Elon Musk.
You should take some of this with a pinch of salt. I don't want to imply these guys did all the work on their projects -- FFMPEG has loads of contributors. ScummVM is the work of many people. But each of them started from a little seed, a seed which was planted by one person.
So what is it that these guys have that apparently no-one else does? Maybe they're just geniuses, but I don't think that's it. I think maybe they just a have a good eye for quality. They have strong ideas about the kind of things they want to exist and they're not afraid to get on with it.
There's a saying which I'm too lazy to look up but goes something like "don't invest in companies, invest in people." Software is one of the few areas where one guy on his own can have an idea and see it through from start to finish. The days of the lone inventor in his garden shed are long gone, but the spirit remains; it's just changed mediums.
I don't really know what I'm getting at with all this. I just find it interesting that there are people out there who aren't limited to one thing. There's supposed to be 7 billion people in the world, yet I find the same names cropping up again and again when you least expect them to. Maybe there are only 500 real people in the world and all the rest are NPCs, who knows.
I suspect what I'm experiencing is cargo cult development - that there's some underlying process I'm not understanding, and I'm just the guy studying the symptoms with the hope of figuring out the cause. Perhaps I'll never know. Still, there are people out there managing a string of unexpected hits, with no signs of stopping. Best of luck to 'em, I say.
The Elegance of Deflate August 21st, 2016
A little while back I found need of a PNG loader for a small project of mine. Being a complete tool I of course decided to write my own -- after all, why save yourself effort when there are still wheels waiting to be reinvented? You can check the source to the inflater code here if you so like - it's quite cleanly written. In fact I wrote it specifically to be easy to read, rather than to be the fastest implementation.
I didn't know much about Deflate at the time. I knew it was based on the LZ family of algorithms (LZ77/L7SS etc). I'd heard anecdotally that it was "just" LZSS except they applied Huffman coding on the match vectors. Well, it turns out that's kinda true. Kinda. But as I read deeper into the specification, it stuck me that something really quite clever was going on here, something I hadn't seen anyone explicitly call out before. So I figured why not try and explain the essence of Deflate here. I'm not going to cover the whole workings in exacting detail; if you want that go read the spec.
Deflate was invented by Phil Katz back in 1993, and forms the basis of the ZIP file format, the zlib library, gzip, PNGs, and probably a whole bunch of other stuff. At the time it was pretty cutting-edge. The main competition back then was usually LZW, or maybe LZSS. In many people's eyes compression was still synonymous with run-length encoding, so when Deflate came along it definitely turned some heads. Now this was over 20 years ago, and so later codecs like LZMA (which I may one day write about) can regularly beat it by a healthy margin. Still, it ain't dead yet.
So the basic principle described here was also used by the LHA archiver, which predates PKZIP 2.0 by a little. Perhaps I should arguably be titling this article "The Elegance of LHA"? Well I'm not going to, so there. History is written by the victors, and the guys with the better acronyms.
There's two basic approaches to compression. OK no, that's not true, but let's go with it anyway and press onwards. To be honest if you know anything about compression already you may well be glossing over much of this, but here it is regardless.
- Entropy based compression
- Dictionary based compression
Entropy based compression is a very old idea. Computers usually store letters as 8-bit; one byte for each letter. "What a waste," someone thought. Some letters like E and T are very common, while letters like Q and Z rarely come up at all. It makes sense that you'd want to use less bits for the letters that are more common.
This idea even predates computers. Go look at Morse code -- the two most common letters in English (E and T) are just a single dot and dash.
Huffman coding is a very old and simple way to apply this. You start with an "alphabet" - the set of all the symbols you want to encode. In English at school we're taught that our alphabet is 26 letters. In computers however we need a little more. We need to store upper-case letters, lower-case letters, numbers, punctuation, we need a symbol for a 'space', and a few other little things. Once you have your full alphabet you need to assign probabilities. We know some letters (e/t/a/i/etc) are more common than others (q/x/y/z/etc), but we also have to take the rest of this expanded alphabet into account. Numbers are rare in English prose, and so are capitals. This paragraph has five-hundred and sixty-one lower-case letters in, but only ten capitals.
Once you have your probabilities all sorted out, you build a Huffman tree. This assigns a unique code to each letter, except in a way that's "self-terminating". i.e. you don't need to store any additional information to say where each letter starts and stops. If you look at the example of Morse code, it doesn't work like that. While Morse has dots and dashes, it also has gaps, and the gaps are needed to split the letters up. Huffman coding needs no gaps.
So that's entropy encoding -- you decide upon an alphabet, you use less bits to store the common things and more bits for the rare things. You can compress things like English language with it, and while it works fine it leaves a lot of room for improvement. The are better ways than Huffman coding of course, like arithmetic coding, but Huffman's the easiest to get your head around. It's also what Deflate uses, so we'll leave it there.
LZ77 forms the basis of the dictionary-based algorithms. The idea of dictionary compression is simply that certain things come up more than once in a document. For instance, if you search this page for the word "compression" it'll appear several times. So rather than store it again each time, let's try and refer to the previous usage.
LZ77 worked very simply. You store the match vector (a pair consisting of length and distance), and following the match you store the letter (the "literal") which ended the match.
So for example, if I want to store the phrase
sense and sensibility, we see that the
senspart is used twice. So we might store the second usage as
go back 10, copy 4, letter i, because we need to copy the 4 letters of "sens" and then stick a new i on the end.
Seems fine, right? Nope, LZ77 sucked. Although it could save space by referring to previous phrases, that was all it could do. The pattern was fixed - it stored a match, followed by a letter. But what if there was no match? To use the same example, here's how LZ77 would actually store that whole quote:
Input: sense and sensibility
Output: [back 0, copy 0]s[back 0, copy 0]e[back 0, copy 0]n[back 0, copy 0]s[back 0, copy 0]e[back 0, copy 0] [back 0, copy 0]a[back 0, copy 0]n[back 0, copy 0]d ...
What a mess. You get the idea though. It has to store a match, even if there isn't one. And storing that empty match information takes space. A typical vector might be 16 bits (12 bits for the distance back, 4 bits for the length). So something that was previously 8 bits per letter could now end up as 24 bits per letter! This is why no-one uses raw LZ77.
So they invented LZSS, which is what many of the early archivers used. LZSS stores a single bit to say whether the next thing coming up is a match vector, or a literal symbol. It's such a painfully obvious idea I'm at a loss as to why it wasn't thought of sooner. So now our extra marker bit allows us to flag whether a match vector is empty, and simply skip all that wasted overhead.
Anyway, that's dictionary compression. It works very well, when it's able to get lots of good matches going. It's that extra bit, though. That pesky "literal" bit we use to indicate whether we got a match. What happens if there aren't any matches? In that case, we've gone from originally storing 8 bits per letter to storing 9 bits! If no matches occur, LZSS will store 1 bit to indicate a literal letter, then 8 more bits to store that letter!
LZ77, LZSS and others in the family all suffer from the same problem, the overhead of having to switch modes -- when they find something that doesn't fit into the class of dictionary-based matches, they need to write out some kind of marker to switch over to the literal mode. This all takes up valuable space in the stream. (if you're not careful, sometimes you'll waste so much overhead doing this kind of switching that you won't actually compress the file at all, you'll enlarge it!)
So we have two compression algorithms. LZSS is reliant on finding previous data to match against, and Huffman coding is reliant on some letters being more common than others. Can we do better than picking one of those two? Can we weave them together?
Yes we can.
If we want to munge these two algorithms together, it's not a great leap to imagine how we might do it. We start from LZSS right? The matching part seems OK, it's just the literals that are bogging us down as they're completely uncompressed. So how about this -- we could just Huffman encode them? We store a bit to say if it's a match or a literal, and if it's a match then we write a 16-bit match vector, and if it's a literal then we write a variable-length Huffman-encoded symbol.
That'd work just fine I suppose, but it's not how Deflate does it. Deflate goes further than that. You're still paying the cost of that marker bit to switch modes. In order to understand Deflate we need to expand our idea of what an alphabet is.
You see, when we hear the word "alphabet" we think back to what we learned as children -- A,B,C, etc. But the moment we start getting involved in compression we know that isn't the case. We have already had to expand our alphabet to include upper-case, and numbers and such. Even spaces, which aren't even visible, we have to encode those too. Deflate just takes this concept one step further.
Deflate is based around the idea of the unified alphabet. If an alphabet is just a set of choices, why not bring all the choices together under one umbrella? Deflate's alphabet consists of 286 symbols. The first 256 are the ASCII codes for each letter, including all the ASCII control codes and other such. The remaining 30 symbols are used to represent lengths. That's right, we're storing the actual match length here. Here's the actual table used:
0-255 Regular input byte 256 End of data block 257 Match of length 3 (distance follows after) 258 Match of length 4 (distance follows after) 259 Match of length 5 (distance follows after) 260 Match of length 6 (distance follows after) 261 Match of length 7 (distance follows after) 262 Match of length 8 (distance follows after) 263 Match of length 9 (distance follows after) ... etc
There's a bit more to it than just that, which I've omitted here for clarity, but hopefully this should explain the principle. We have 286 symbols now, and each one will be assigned its own probability, and therefore its own Huffman code. We know that some letters are still going to be more common (E, T, etc). But what about the matches? How common are they?
Remember how we had the frequency diagram of English text earlier? Well here's how it looks with our expanded alphabet:
You can see that the lower-case 'e' is popular, as are some of the other lower-case ones like 'a/i/t/etc'. Capitals are almost non-existent, and there are no number digits at all in this case. Yet something sticks out like a sore thumb -- the dictionary matches which dominate the probability domain. It's also interesting that the probabilities of the lower-case letters have changed, due to the dictionary matches evening it out.
As we compress the data one symbol at a time, we face a choice at each stage of compression. If we can find a previous match then we can just write that match vector out, and if we don't then we write the literal symbol. But the special beauty of Deflate that sets it apart from its predecessors, is that whatever option we pick, we're only ever writing out a single Huffman-coded symbol.
If we want to store two matches in a row, or two literals in a row, we can do that. One alphabet to store two different ideas. No bit markers needed to designate a switch between modes. We're storing literal letters compressed according to their frequency, we're also storing match lengths according to their frequency, but more than that we're doing both of those things under one unified scheme.
It's not one algorithm but two, that dance and weave together in harmony. You can pick dictionary matching, or entropy coding, and you can select between them on a per-byte basis with no overhead. That's what I find so clever about it - not that it can do one thing or the other, but that it can choose either, and yet represent them using the same language. No extra markers, nothing to say "oh, now we're switching to entropy mode", nothing to get in the way.
It's an RLE encoder, it's a dictionary matcher, it's a frequency table, and it's all of these things with no barriers in-between, no modes to switch nor separate passes to run.
Deflate remains an excellent example of how two algorithms can come together and play off one another, rather than fighting against themselves.
I gotta admit, I'm impressed.
The Secret Button August 17th, 2016
There's a secret button hidden in Windows 7. At least, it's a secret to some people. Many modern users won't know about it, whereas some like me have used it every day for over 25 years.
I use Windows 7 on my machines at home. Perhaps I should finally relent and upgrade to Windows 10, I don't know. I've heard nothing much good about it, but you can't live in the past forever. I guess one day they'll release Windows 11 or some such and then I'll take the plunge. I've managed to avoid the whole issue so far, which wasn't easy let me tell you. It can be very insistent.
Perhaps I'm just scared of change. Whenever I see a screenshot I've never been impressed. Windows 7 was one of the first editions of Windows that actually looked nice in a long time. The attractive Aero-glass frames actually sit well on the desktop. The controls are pleasantly smoothed and rounded. It hasn't looked this nice since the old CTL3Dv2 days (heh, there's one for the old timers). Part of me definitely wants to stay here for the view. Yet perhaps there's another part that's simply worried they might have finally removed my secret button.
Here's a picture of the secret button. You can't see it, it's invisible. But it's there. If you're running Windows 7 now, try clicking it yourself. Maybe it's still there in Windows 10, I don't know. I'm sure they've found yet another way to shuffle everything around just for the sake of having something to show for it. I just took a look at some Windows 10 screenshots while writing this and it looks like maybe it's come back again, but it seems to have brought some friends with it.
A single click on the button brings up the system menu. I never bothered about that much, because this button isn't really meant to be single clicked. The real use is when you double-click it, because that closes the window. That's the way I've closed windows for 2/3rds of my life. Why don't I just press the "Close" button instead? You know, that red X in the top-right? Because when I first starting using Windows that button just wasn't there.
Take a look at Windows 2.0, the first version I ever used way back in 1990. The now-familiar red "X" is nowhere to be found. Back then if you wanted to close a window, you double-clicked the top-left button. It all changed once Windows 95 came along. The familiar up/down arrows got replaced by "line" and "box" buttons, and the new close button appeared at the top-right. "Up and down" made sense. "Line and box"? Windows 95 brought a lot of advantages, but we lost so much of the little touches along the way. At the time it seemed new and exciting, in hindsight I kinda miss 3.1. Things made sense back then. 3.1 had this simplicity to it, something you could actually explain to people when they asked you how to operate it. Do this to open, do this to close.
Despite the new changes, the old close button never left. It's still there, even on today's Explorer, it's just invisible. So now we have two close buttons, one in the top-left you can double-click, and one in the top-right you can single-click.
Maybe that's progress. One click is better than two, right? Or maybe it isn't -- destructive actions should require a little more confirmation, no? It used to be you had to double-click almost everything if there was a chance of a strong side-effect; double-click to open programs, double-click to close them. It formed a kind of parity, a set of rhythmic actions you could subconsciously learn. Now we single-click Start Menu items, or URLs, or pinned icons, yet double-clicking still sneaks in on occasion. I don't know any more.
I'm going to keep double-clicking the secret button. Perhaps one day it'll finally vanish entirely, and I guess then I'll switch to the other one. 25 years is a long time to have been doing things one way. My secret button's probably been there longer than some Microsoft employees have been alive. I don't think I'll give up on it just yet.
Written by Richard Mitton,
software engineer and travelling wizard.
Follow me on twitter: http://twitter.com/grumpygiant