Little Lightmap Tricks October 10th, 2017
Just a quick post today to write down some lightmap lessons I've learned over the years, inspired by Ignacio Castaño's post on his iOS optimizations for The Witness. Many years ago I helped a little on the Wii port of "Call Of Duty: Modern Warfare", and it reminded me of the "fun" I had rewriting the lightmap system to fit into that. So I thought I'd write up some of the little tips and tricks I picked up along the way. Nothing special here, just some common mistakes to avoid.
Don't put gaps in!
It's surprisingly common to see people generating lightmaps with empty pixels in-between each chart. You don't need to do this! Put them right up against each other without any gaps. If you're worried that maybe you needed the black pixel to stop the charts bleeding into each other, then you're calculating your UVs wrong.
Squish blocks down
If all the pixels in the chart are exactly the same color (or near-enough), you don't need to waste space storing all of them. Just shrink the entire chart down to a single pixel. And furthermore (see below), use the same single pixel for all of the shrunken charts.
Share identical charts
You might imagine that the majority of a lightmap's space is taken up by nice big chunky pieces, like open terrain areas, or the sides of buildings. But in fact, you'll find that probably the majority of the lightmap space is occupied by a thousand little tiny shards of rubbish. Many of these only occupy a single 2x2 or 3x3 block on the lightmap. Now if you think about it, there's only so many 2x2 blocks that can exist in the world. So:
For each chart, search for all previous charts that are the same size and have the same contents (within a given pixel error). If you find any, throw the new chart away and simply re-use the UVs of the old one.
You might even find that this doesn't just work for little 2x2 blocks. If there's any instanced geometry in the level that's facing the same direction, then they'll often have identical lightmaps too. One example would be the side of an apartment block, with many balconies. Because the sun is a directional light, each balcony will have the same shadow cast onto it. So you can get chart re-use in a lot more places than you'd think.
Don't ruin your block compression
You can get a big benefit by using a block compression scheme for your texture (DXT/BC/etc). But don't just compress your texture without thinking first! DXT stores two colors for each 4x4 block. The pixels you didn't write to on the lightmap will be an empty black. Do you really want to waste one of those colors on storing black? Of course you don't.
For each 4x4 block, fill in the unused pixels with one of the other pixels from the same block (doesn't really matter which).
One of the really cool things I did was to write a little visual debugger -- a small command-line parameter that would pop open an OpenGL window showing the results of the bake. You could fly around and inspect the results, and if you found a strangely black triangle somewhere, you could click on it, and the program would re-run the lighting for that triangle and automatically break into the debugger at the right location. I highly recommend this.
The scheme I used
I tried a lot of texture compression schemes out, but here's the one I finally settled on. Bear in mind we were super-tight on memory, so I didn't want to use anything more than 4-5 bits per pixel really.
Like its parent 360/PS3 versions, the Wii port of COD:MW uses a non-HDR lighting engine where the sun's shadow is stored off as a separate lightmap channel. This allows partial time-of-day changes, special effects like lightning flashes, and also allows the total lighting brightness to exceed 1.0, allowing some overbrightening. I really wanted to keep that scheme for the Wii port, but I didn't want to use any more memory.
The shadow term is stored at full resolution, in the red channel of a DXT1 texture. This consists of the shadow visibility multiplied by the N.L term. This is then blended at runtime with the actual sun color.
The remaining non-sun light is stored a little differently. I split the secondary light into its separate components -- luminance and color. A separate RGB(565) texture stores the color, at quarter resolution. The luminance is stored at full resolution into the green channel of the above full-resolution texture. At runtime, we simply read both textures and multiply them together.
Now, you have to be careful about this. DXT compression relies on correlation between the channels -- you can't just throw any old data into the separate RGB channels and expect it to compress well.
But it turns out that for our purposes, this works great. It tends to be that each area of the world only has one strong light affecting each point (either the primary sun or one of the secondary lights), so the DXT compressor is free to EITHER:
a) If the shadow term is mostly constant, focus its efforts on the secondary luminance. Or, b) If the shadow term varies, focus on that and let the luminance suffer.
It tends to work out either way though. The human eye is drawn to brightness, so if the luminance gets bad compression then you're probably looking at an area that has strong, varying shadows, and that'll be the thing that stands out anyway. And of course, when you throw the diffuse texture on top it hides a lot of any remaining errors.
Because the quarter-res 16bpp texture has 1/16th the pixels of the high-res 4bpp texture, the total storage space is effectively 5 bits-per-pixel.
Well there you go. Nothing that special, but it's nice to write these things down sometimes so they don't get lost. Of course, it's all voxel GI these days so I don't expect this to be of much use, but there it is.
Why Command And Vector Processors Rock September 7th, 2017
I had a Commodore Amiga as a kid. I'm told they were never especially popular in America, but in Europe they were everywhere. Well, sucks to be them I guess.
The Amiga was and still is, for its time, the best home computer ever made. It had a clean, powerful CPU architecture. It had an operating system that blended the best parts of CP/M and Unix with none of the unfriendly parts. It had 4-channel digital stereo sound playback in an era when a typical PC had a sound system that was either 0 or 1. It had proper multitasking, which took another ten years to finally arrive on Windows. It was also a completely open platform, unlike todays mobile, console, and store ecosystems.
But most importantly, it had Agnus.
Angus is the name for the main chip inside the Amiga. Its primary role is for graphics, but not as you'd think. The Amiga was unusual compared to most other home computers of the time. A typical early 80s computer has a CPU (usually Z80, 6502, or later the 68000). It would also have a video chip, which would read from the framebuffer RAM and either output pixels directly, or look up the bytes in a tilemap and output that instead. And that was generally as far as it went.
The Amiga, however, wasn't satisified with that. It had a CPU, sure. And it had a video chip ("Denise"), which read in bytes and spat out a video signal. But it didn't stop there. It had a custom-designed ASIC for each part of the machine. The entire hardware was built around these "custom chips" and the means to let them communicate.
Agnus is a kind of "ringmaster" chip. Its main component is the DMA controller (Direct Memory Access). This lets bytes be read from main memory and sent around to the various custom chips as needed. You can think of it as an asynchronous memcpy -- you give it an address and it'll either read or write bytes one at a time to/from the appropriate chip. It supported 25 different DMA channels at once for all the different parts of the machine that needed RAM access.
So what would you want to do with all this DMA? Let's look at one of the biggest examples -- the blitter.
The blitter was another part of the Agnus chip. It's operation was very simple. You'd give it three source pointers, one destination pointer, and a function ID. It would then read individual bits from memory (processing them 16 at a time), perform an arbitrary bitwise operation on them, and store the result out. You can think of it as a general-purpose bitwise arithmetic chip.
Given three bits (let's call them A,B, and C), there's exactly 8 different combinations that can result from these. So in order to specify your arithmetic function, you just need a lookup table of 8 result bits. This handily fits into a single byte.
Having this kind of bitwise arithmetic was important because, like many machines of the time, the Amiga used bitplanes as a format to store its graphics in.
Let's say you're using 32-color paletted mode. That's 5 bits per pixel you need to store. How do you store that? Well, you could use a byte per pixel, use 5 of the 8 bytes to store your data and leave the other 3 empty. But that's a hell of a waste. Instead, you store it as 5 individual bitplanes, each plane using one bit per pixel. (i.e. each byte contains one bit from eight different pixels)
Now let's apply our blitter to this. Imagine we want to draw a sprite on-screen. We've got it stored as 5 individual bitplanes, plus a sixth 'mask' bitplane to store the transparency. To get the blitter to draw this for us, we set up our three inputs:
A - The position on the framebuffer we want it at B - Our sprite source data (1st bitplane) C - Our sprite transparency mask
We'll need to repeat the whole thing a total of 5 times, one for each sprite bitplane, but that's fine (it's real quick!). The final piece of the puzzle is how we specify how to mix these three inputs. We can build a Boolean truth table to handle it.
Our goal is to use the transparency bit (C) to select EITHER (A) the background data (if C=0), or (B) the sprite data (if C=1). i.e.
D = C ? B : A. To figure out our function ID, we just list out all eight cases:
A B C D (output) 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 1
If we concatenate all the bits of D together, we get the value 0xD8. This is called a minterm, and it represents our bitwise operation in its entirety.
This "minterm" idea is a pretty powerful one. You can combine elements together to get any bitwise function you like. Want to XOR images? Sure. Want to just clear memory? That works too, just set the ID to 0x00 and ignore all the inputs. You'll still occasionally see systems that use this. Windows, for example, still uses it for its BitBlt function, although you'd never know that from reading the BitBlt documentation.
To actually program the blitter to do this, we simply write the three source address into three of its registers, write the function ID to another register, and then signal it to start. It'll run in the background while our CPU gets on with other things, and we can either check a flag to see if its finished, or get it to wake the CPU with an interrupt.
So far we've seen how Agnus contains the blitter functionality, and the DMA controller. But there's one more little secret hidden inside this chip, and that's the co-processor (aka "COPR", or simply "copper" to its friends).
The copper was a completely independent CPU that ran in parallel with the main one. It wasn't a Turing-complete, general-purpose CPU like the 68000. It only had three instructions. It didn't have its own memory or registers, but instead shared main memory (like everything else on the Amiga), and it could directly access many of the registers inside the custom chips.
The copper read its instructions via DMA. This meant you allocated some memory and filled in a program, called a "copper list", by writing 16-bit instructions into it. You then pointed the DMA at that address and started the program. The DMA would fetch each instruction and feed them into the copper.
So what could you do with a CPU that only has three instructions? Let's see what the instructions were:
MOVE reg, value WAIT X, Y SKIP X, Y
That's a pretty simple machine. We can load a value into a register, wait for the raster beam to hit a specific X/Y position, or skip the next instruction if the raster beam is past a specific X/Y position. Doesn't sound like much at first. If I wanted to write registers I could just do it on the main CPU, right? Why would I want to wait to write registers at a specific time? But there's some surprising effects you can get out of this simple mechanism.
What the copper let you do is to apply different properties to different parts of the screen. You can change the address of the framebuffer on a line-by-line basis, for example, to create parallax scrolling in Shadow Of The Beast (above).
Or you can change the address of the framebuffer at the halfway point on each line, to create a 2-player scrolling split-screen, like in Lemmings: (no other port of Lemmings could do this!)
Or you could wait till a certain line and change both the color palette AND the frambuffer address, to create a wobbly water effect, like in Ugh:
And this wasn't just a trick for games, even the operating system made use of this. If they chose to, the Amiga allowed programs to have their own private screen with a different resolution. Windows suffered for years with this problem -- ever switch back from a DirectX program and see all your icons have moved around on the desktop? Not in Amiga-land. Here, two different programs on different screens can co-exist just by dragging the menu bar down:
These aren't just different windows you're seeing here. These are different screens, each running at a different resolution! The lower screen is 640x256 at 4 colors and the upper screen is 320x256 at 32 colors. Try that on a PC.
All these effects, and more, are achieved via the simple ability to change settings when you want to, rather than having them fixed at the start of the frame. It didn't require more power to be added to the system, just the flexibility to use the existing system in unusual ways.
If you want to see more creative uses like this, try the excellent codetapper.com which takes apart many Amiga games to see how they do things.
Hardware as a tool
The reason I'm writing all this isn't just to show off how cool the Amiga was. I want to show how its design principles allow new avenues to be opened up.
The Amiga hardware never said "this tool is for this purpose". It gave you a toolbox but let you decide what these things were to be used for. And it allowed each tool to interoperate with the others using common registers and common data formats.
I've presented the blitter here as a thing for processing graphics bitplanes, but it was really just a vector coprocessor for operating on boolean/bitwise data. It could be used for other tasks, and it was. The Amiga's floppy disks were formatted using MFM encoding, which is a kind of edge-based binary encoding. To decode it, you had to process the bit array from disk and look for 0-1 transitions. The blitter provides an ideal tool for doing this with, and the OS made use of it for exactly that. The same kind of tasks we might use a compute kernel for today, perhaps.
The copper, while seeming to be a very simple processor, effectively acted as an amplifier for the power contained within the other chips. It could be viewed perhaps as a metaprocessor -- not doing the work itself but controlling the work of others.
This combination of a vector processor and a control chip is a powerful one. It's so powerful in fact, that the machine you're reading this on now has the same architecture. A modern GPU consists of three parts:
Part a) is a thing that can draw triangles. There's usually special-purpose hardware for doing this. There was a time a few years ago when this was what we thought of as the GPU, but we're seeing less and less of that every year. Games now are doing voxel ray-tracing, and people are using GPUs for lots of things other than just rendering.
Part b) is the vector processor, a unit that reads data and runs functions using it. Ours are much more powerful than the old blitter though. We can do full floating-point operations on ours, not just bitwise ops. But it's a more advanced version of the same principle -- a program that operates on many values at once rather than just one.
Part c) is the command processor. A modern GPU has a chip that reads instructions from the host CPU, decodes the various draw calls, state changes, etc, and then issues work to the vector processor (for compute kernels). Or, when using rendering APIs, it sends work to the triangle drawer which in turn sends work to the vector processor (either to shade vertices or pixels).
Right now we're a little stuck, however. A modern GPU lets you use its triangle drawer (via OpenGL perhaps), and it lets you use its vector processor (via CUDA perhaps). But the one thing it does not do, on almost any platform (even most consoles), is to let you use the command processor. About the only one I've ever seen that did give you that kind of access was the PlayStation 2, something I'll no doubt write about in a future article.
You see, the Amiga documented its command processor. The designers wanted you to write programs that ran on it. They wanted you to use it for doing all sorts of clever things. They recognized that the power to operate the underlying horsepower directly was something that could amplify the capabilities of a system way past the limits of its original design.
But on Direct3D, or OpenGL, all you can do is call DrawIndexedPrimitive etc. and let it do things on your behalf. You can't build your own copper lists like you used to on the Amiga. Some APIs let you make a command buffer, but they're usually just recording API calls into it. You can't program it with your own logic, or your own algorithms. The 3D driver has the power to do this, but you don't.
The Amiga was a good machine not because of what it was designed to do but because the designers intentionally gave you the flexibility to do things they'd never designed it to do.
The old COPR chip only had three instructions and couldn't do much by itself, but you could use it to make the rest of the system sing. I'm sure the command processors in modern desktops have a much more advanced processor -- I'd love to see what we could do with them given the chance.
The Danger Of Opinions September 3rd, 2017
Warning: this post may contain opinions. If you are allergic to opinions, please try the associated reddit thread instead where you will be safe from them.
For years, MIT taught their SICP course using Scheme. And you know the weird thing about that? No computers involved at all. It was all just done on a whiteboard, using symbols and parenthesis. No registers, no instructions, no memory. It showed you what computing really was -- an abstract concept that isn't tied to any implementation. The idea that computing doesn't actually require a computer is somewhat alien to many native C++ programmers.
Then you've also got the engineering crowd. People like me whose first exposure to computers was that 8-bit home computer your Dad bought home one day in the 80s. I didn't grow up in a world of evaluation, expressions, and functions. The computer I had knew only about bytes, and how to move them about in memory. It was always about how things get done. What use were abstract concepts in a world where you needed to do specific things in order to see the results?
These two groups often fall under the banners of "static" and "dynamic" typing, and it's perhaps no coincidence. Static typing tries to tell the computer exactly what needs to be done, at the expense of moving the program further away from the abstract description. Dynamic typing expects the computer to figure things out, so that the human can just write things in a nice clean manner.
Which leads on to the ultimate question of programming: Should programs be written for the benefit of humans or for computers? Exactly whom are we trying to make it easier for?
It's a simple point but you see the repercussions of this appearing everywhere, hundreds of little design decisions that push programming further into two camps. UNIX, for example, demands a case-sensitive file system, on the grounds that the file system can be done more efficiently if its only concerned with matching bytestrings. Windows says that being able to create two files with the same name but different cases isn't useful to humans, and is only confusing. Why should humans have to keep track of where the capitals were placed, and why should auto-complete suddenly stop working because I forgot to hold SHIFT?
So which is right? Which is better? Should computers adapt to us or should we adapt to computers?
There's a book I love by Robert M. Pirsig called "Zen and the Art of Motorcycle Maintenance", which contains over 500 pages of pseudo-philosophical bullshit (oh who am I kidding, I still love it) centered around the idea of "quality". It's got this lovely disclaimer at the front where the author notes that the book doesn't really have anything to do with Zen Buddhism, and "it's not very factual on motorcycles either."
The central pillar of the book is what he calls the "classical" vs. "romantic" ideals. The classical, he says, is concerned with what something is and how it works. The classical viewpoint wants to know how their motorcycle works, how to recognize where that weird knocking noise is coming from, and wants to tune their engine to keep it running well.
The romantic viewpoint is instead concerned with how we see something. It's not important how something works, but how we see and use it. The romantic person wants to use their motorcycle to drive along beautiful mountain roads, and use it to get to far-away places.
The classical person sees a rainbow and wonders how it formed, and how the rain might reflect the sun like that. The romantic person sees a rainbow and wants to show others, and paint a picture of it.
This two-sided philosophy is found throughout the whole of human life, and especially in computers. One of the things I love about computer programming is that it's one of those areas where we actually get to use both at the same time, even within the same program. It's what makes a game developer want to be an artist or a programmer. And yet the game needs both to work.
So which is better? Well unsurprisingly, neither. You need both viewpoints, sometimes at the same time. And that's the weird part. How can two opposing ideas both be correct?
But they can.
I remember once talking to an artist friend of mine. We were talking about computer animation, and the subject of IK (inverse kinematics) came up. What puzzled me is that he wasn't a big fan of it. Now to me as a programmer, it seemed an obvious choice. Of course IK is a better way to do things. You just tell the arm or leg or whatever where you want it to go, and it automatically moves the elbows and knees and such for you. So surely that's less work, and therefore better?
But he explained things to me. You remember when as a kid, you drew stick men? Well in my mind that's how the human skeleton looked. But he explained about "clavicles", something I barely even knew existed but in fact drive the whole upper armature. And he explained how the best algorithm in the world isn't going to give you the results you want if there's more than one solution available. What had seemed a simple "pointing a finger" problem was unfolding into a world where you had to try and teach the computer how to be an artist. It slowly dawned on me that I didn't have the full experience of the problems he was describing, and couldn't make a case to argue back with.
It's weird when you suddenly realize that there's a separate world out there that you're not an expert in. It certainly changed my outlook on things. I think there's a lot of programmers who still haven't had that moment, and still live in a world where they believe they know everything.
Did that make me wrong about IK? Well, no. It's still useful. Did that make him right? Maybe, maybe not. But what it shows is that you can't have a discussion one way or the other unless you actually know a little about the other viewpoint.
Issues aren't black and white. And sometimes you can have two opposing viewpoints that are both valid. Programmers hate this. It's very un-pythonic.
Did you ever have an experience where someone you'd greatly respected suddenly said something you strongly disagreed with? Does that invalidate all the things they said that you did like? Do you stop talking to your best friend because you found out he voted Republican?
Complex issues can't just be simplified down into tribal arguments of us-vs-them, or solved by just shouting at the other person until they go away. We need to get over this cultural idea we have where anyone who disagrees with us is literally Hitler. It's OK to disagree with someone. And just because we disagree with someone doesn't make them wrong. It's possible for two people to disagree and yet they both still are correct. It's so common, especially in the media, for someone who changes their mind later on to be labeled a hypocrite. "But last year," they cry, "you said this thing. Now you're saying the other!". Yet the ability to change our mind is the most important thing we have. An opinion that is rock-solidly fixed in place is just tribal politics. Opinions should be swayable via convincing arguments.
On the one hand it's easy to look at things like the direction the C++ committee is taking and laugh; C++ has become an insane language that no one person has a hope of understanding. But they definitely seem to be heading towards a destination, perhaps if only by accident. What are they actually trying to achieve? A language where you can do anything, but only at compile time? Perhaps Python is a cleaner approach, by pushing all problems to runtime, but even they're now starting to realize that maybe type annotations are a useful feature.
But we need both. The idea of the "one true" anything is bullshit. There will always be different sides, with different ideals. And that's fine. We need that. But what we don't need is us-vs-them. People need more exposure to different ideas. Programmers need to try out different languages. Webdevs could learn a hell of a lot from trying to write a Z80 program. And a lot of GPU shader guys could learn a thing or two from watching how Bob Ross can manage to paint a tree without knowing how sub-surface scattering works. Because let me tell you, however you've been doing things so far, there's a whole different approach that other people have been successfully using that you have no idea about.
I dunno where I'm going with all this. I just figured I'd write some of these rambling thoughts down, although putting your thoughts into words can get you fired these days. Probably best just to stay absolutely quiet and avoid doing anything that may or may not cause two opinions to form. We do have to be careful, you know. Sometimes we can create a difference of opinion so vast that the universe has no option but to bifurcate in order to accept both.
Written by Richard Mitton,
software engineer and travelling wizard.
Follow me on twitter: http://twitter.com/grumpygiant