The Illusion Of Controls October 15th, 2016
Some things are just plain hard to use. There's a lot of things you have to learn before you can use it effectively, there's big manuals you have to read, there's gotchas you need to be aware of first.
I can't use this thing, people will say. I just want to do X, why do I have to mess around with all these details which are getting in the way? Why can't it be simple?
It's a common enough complaint, one you'll hear directed against anything from the Linux command-line through to the user interface of Dwarf Fortress.
But for every complicated interface, there's always someone who'll reply with the common phrase:
"X needs to be that complicated. Y is fine for noobs, but with X you get so much more control."
And herein lies the problem. The illusion of controls. It's not often that one letter can make such a difference, but what we're talking about here is control vs controls.
Whenever a problem is presented, all good programmers know to head directly for a good car analogy. It's a tried-and-tested method we've used for years which I'm almost certainly sure won't backfire on me terribly. Nope. Here we go!
I'll quote from these instructions on how to drive a Model-T Ford, an experience once described by Top Gear's Jeremy Clarkson as "the hardest thing in the world", like trying to pat your head and rub your belly at the same time.
There are three pedals on the floor marked from left to right when sitting in the driver's seat: C (clutch), R (reverse) and B (break). There are two levers on the steering column, spark advance and throttle, and one floor lever to the left of the driver. The floor lever is neutral while in the upright position, second gear when in the forward position while the leftmost pedal (C) is not depressed, and emergency brake when all the way back.
All speeds are controlled by a foot pedal enabling the driver to stop, start, change speeds, or reverse the car without removing the hands from the steering wheel. The foot pedal at the right, operates the brake on the transmission. The pedal in the center, operates the reverse. The left foot pedal, is the control lever acting on the clutch.
The hand lever when thrown forward engages high speed; when pulled back, operates the emergency brake. The lever is in neutral when almost vertical and clutch is in a released condition. With the hand lever thrown forward in high speed, a light pressure on pedal 'C' releases the clutch while a full pressure on the pedal throws into slow speed; by gradually releasing the pedal, it will come back through neutral into high speed.
There's more -- you've also got the "spark advance" lever (no idea), a carburetor adjustment, and an additional throttle lever.
On the other end of the spectrum is something like a bicycle. On a bicycle you have pedals for your speed, and handlebars to steer. That's basically it, there's a brake lever and maybe a gear selector. It's an incredibly simple machine though, one which you can learn without instruction very quickly.
Part of the simplicity of the bicycle is that you can see how it works. It's immediately obvious how your action on the pedals affects the ground speed. You can directly see the relationship between the brake line and the wheel.
The difference is a bicycle has control. The Model-T has controls. With a bicycle every part of your body is in direct connection with the movement. You can lean into corners, shift your weight around. You can feel the brakes and whether you're applying the right pressure.
The Model-T takes the approach of having controls. A complex beast of a machine that requires memorization, and the knowledge of what each control does and how it operates the beast. Modern cars try to simplify that as much as possible, and cars like the Tesla go to an extreme, having not even a clutch or gearbox. Perhaps it's not the best parallel, and I'm sure some reader will call me out on my terrible automotive knowledge. But the point remains valid, and programmers would do well to heed it.
An Etch-a-Sketch has more controls than a paintbrush, but has no control. Give an artist a paintbrush and it becomes an extension of him, a tool he can use to apply different pressure, to feel the paint as he moves it around a canvas. An operating system like Linux gives you a lot of controls -- a hundred different config files to adjust, fontconfigs to manage, but it gives you no control. We spend so long caught in a Byzantine complex of configurations and dotfiles that we fool ourselves into thinking that the controls were the reason we originally came here.
And for the record, I don't feel the solution is to just hide the complexity like so many programs do (I'm looking at you, Apple). That's like putting the controls inside a sealed box and hoping you won't try and adjust them. We need more systems like the Tesla and its all-electric drivetrain, where the need for controls dissolves away.
The Challenge Of Making Things October 4th, 2016
What would you do if you could do anything? If you were all-powerful, and could create anything just by waving your hands around? The answer is, nothing.
You could create anything you want right now. Just grab a pen and paper and draw a new universe. You could have drawn 5 different scenes today, but you didn't. It takes a special kind of person to push past the barrier of the empty page, a talent I and a large majority have trouble with.
An unrestricted blank canvas is the worst possible killer of creativity. People build models of Game Of Thrones worlds in Minecraft. Why? They could build it in Autodesk Maya, or Sketchup, and it'd look better (and might even be easier for a large scene). The restrictions breed creativity. It's not enough to just make something, you need a frame to place it within. Artists need a frame to define the medium they work within.
Why do people build a working 6502 using Minecraft redstone? I mean if building a 6502 is the goal, why not use a better medium? Perhaps this shows us that the subject itself is not as important as the medium it's made in.
Retro games were originally developed in a constrained medium, and required great effort on the part of the designers to fit the game into that. Now we have gigahertz CPUs and GPUs, but people still make pixel-art games and text adventures. In a world where Unity provides instant access to a professional quality 3D scene editor, why would someone make games in PuzzleScript?
We need mediums. An artist with no restrictions will never make anything. Some people think retro games are a fad, but I think as technology gets broader we're going to look to the narrow canvas ever more so.
I worry about things like Dreams, which promises an all-powerful blank canvas. I think a large part of Little Big Planet's success is that you couldn't make everything. It limited you to a simplified 2D canvas of mechanisms, but people had fun trying to push those limits. If there are no limits, what can you push?
Untonemapping, and other stupid tricks October 2nd, 2016
I've been meaning to write something about this for years but never got around to it. I don't claim there's any great use for this stuff; it's just one of those little oddities us graphics programmers like to collect.
You all know what tonemapping is - converting a HDR (High Dynamic Range) image into an LDR (Low Dynamic Range) image for display. What might not be immediately obvious is that it's reversible.
We can formalize this relationship using the following notation:
L(x) = 1 - exp2(-k*x)
That's the standard formula for an exponential tone mapper. There's a lot of other functions you can use (Reinhard etc) but for the purposes of today's article it doesn't matter, so let's just pick the simplest one to work with. For final display of course it may well matter, but we're not talking about final display here. The reason it doesn't matter is that it cancels out.
Note that I'll always be using exp2, not exp/log/ln etc, and you should too. GPUs often only support exp2, with the others requiring an extra multiply to convert bases. So if we work in base 2 ourselves we can save that multiply.
So if the formula above is tonemapping, what's untonemapping? Well it's just a simple inverse:
H(x) = log2(1 - x)/-k
Ok so far. But what use is it? Let's try an example.
Let's say you want to render a nice image of a wooden teapot (hey why not). So you make a beautiful Photoshop image, like so:
Then you slap it on a basic lit mesh. First let's compare how it looks when you render it using an old LDR engine, then using a standard HDR engine with tonemapping:
// LDR: float3 diffuse = tex2D(diffuseTex, uv).rgb; float3 color = diffuse * lighting; return color; // no tonemap // HDR: float3 diffuse = tex2D(diffuseTex, uv).rgb; float3 color = diffuse * lighting; float k = 2.0f; return 1 - exp2(color * -k); // exponential tonemap
Eurgh. Both of these images kinda suck. Our texture looked so nice in Photoshop, but now it's been distorted in both renderings. The LDR one preserves the vibrant orange colors well, but because it's an LDR engine it can't light the thing properly, and clips the colors badly.
However, the HDR engine has captured the full lighting range, but at the expense of draining the contrast and saturation from the texture. This does depend on which tonemapping curve you use, some fare better than others. But they all tend to exhibit this problem. Why is this?
The problem is that we're using this photo as a diffuse map, but it isn't a diffuse map. What the photo really is, is the output of another renderer. (In this case, the renderer was the real world and a camera)
This means the source photo is already tonemapped. We need to reverse the process to recover the original diffuse map. We can do this by assuming the photo was taken under some standardized lighting conditions, and simply running it through the untonemapping operator.
But, you ask, how can I untonemap it if I don't know the value of k to use? That's the cool part: it doesn't matter. Just pick one (1.0 works well). That'll be our reference exposure value. The exposure values we then use to render our scene will then be defined relative to our base.
// HDR using untonemapping to correct the diffuse texture float3 diffuse = tex2D(diffuseTex, uv).rgb; float k = 2.0f; diffuse = -log2(1-diffuse); // untonemap color = diffuse * lighting; return 1 - exp2(color * -k); // tonemap
So how does that look now?
Well that's a lot better. It now matches the source map exactly - the output of our renderer is identical to the artists image, and we now have a good mathematical framework for taking our output results and working on them.
Before we go any further I'm going to make one small but important tweak. There's a lot of "1-" going on here, and it's kinda annoying. Let's get rid of it. We don't need it.
L(x) = exp2(-k*x) H(x) = log2(x)/-k
This means all our LDR images will be inverted, but we can just flip it back before display. I'll call this space inverted-LDR, and that's what I'll be using for the rest of the article.
This now means that in LDR space, black represents infinitely bright. This turns out to be surprisingly useful. In fact, it makes me wonder if that isn't in fact the natural image representation we should all use by default.
And now for my next trick
So what are the consequences of this? Well, now that we have a more rigorous definition of how to convert to/from LDR space, we can convert some common HDR operations so that they work in LDR space directly.
So, for instance. In HDR space, if you want to add two colors together, you just add them. Let's write that out:
Ah(x, y) = x+y
We can get the LDR equivalent by tone mapping it:
Al(x, y) = L(Ah(x, y))
Expanding that out:
Al(x, y) = L(x+y) Al(x, y) = exp2(-k*(x+y)) = exp2(-k*x + -k*y)
Now here's the trick. We can use the laws of logarithms to split that apart:
= exp2(-k*x) * exp2(-k*y)
Do you see what's happened here? It's equivalent to tone mapping the two colors individually, then multiplying them. Just to spell that out for you:
Given two inverted-LDR images, you add them together by just multiplying them.
ADDinv(x, y) = x * y
What would happen if we were using regular-LDR instead of inverted-LDR? Let's write it out with the 1-x's in:
ADDreg(x, y) = 1-((1-x)*(1-y))
Oh look, that's the Photoshop 'screen' blend mode. I don't know if that's something the Photoshop designers intentionally thought of; if not, it's certainly an interesting co-incidence.
But wait! There's more!
That's addition taken care of. What about multiply?
In HDR-space, a multiply looks like this:
Mh(x, y) = x * y
We can do the same tricks as before. First let's tonemap it to get it into LDR.
Ml(x, y) = L(Mh(x, y)) Ml(x, y) = L(x*y) Ml(x, y) = exp2(-k*x*y)
And then apply logarithms to split apart again:
Ml(x, y) = exp2(-k*x)^y
What does this mean? It means that if you have an inverted-LDR image, and a HDR image, you can multiply those together by raising the LDR image to the power of the HDR one.
MULinv(xl, yh) = xl^yh
Here's an example of how you might throw all this together. Let's imagine you're starting with an inverted-LDR diffuse texture, and you want to do some HDR lighting with it. We can use the "multiply" rule to do the diffuse lighting, then the "add" rule to add on the specular lighting. Note that the diffuse texture remains in inverted-LDR space throughout, and the final result needs no tonemapping, because it is already in LDR space.
float3 diffuse = tex2D(diffuseTex, uv).rgb; float3 diff_lighting = calculateHdrDiffuseLighting(); float3 spec_lighting = calculateHdrSpecularLighting(); float3 ldr_output = pow(diffuse, diff_lighting) * tonemap(spec_lighting);
I'll summarize the inverted-LDR-space rules in a table here:
Rule Formula HDR to LDR exp2(-k*x) LDR to HDR log2(x)/-k HDR + HDR x*y LDR * HDR x^y
So there it is. As I said, I don't know if this is going to be especially useful to anyone, but I thought it was interesting how you can do mathematics in LDR-space and yet get the correct results of HDR lighting.
Written by Richard Mitton,
software engineer and travelling wizard.
Follow me on twitter: http://twitter.com/grumpygiant