codersnotes

The joy of INCBIN October 5th, 2014

-or- why do we have to load?

Why do games have to load data?

At first glance that seems like a stupid question. Of course they have to load data, textures, models, etc. How else would they draw anything?

But games didn't used to load data files. That's right, in the early 80's, calling functions to load your data would have been considered unthinkable.

Many people reading this will be too young to remember the INCBIN command, or one of it's many variations. Video games in the early 80's were generally written entirely in assembly language. Most assemblers had a special directive, usually called INCBIN or similar, which would allow you to include any binary file and embed it into your program. I'll provide a brief example:

sprite_standing: .incbin "stand.spr"
sprite_walk1:    .incbin "walk1.spr"
sprite_walk2:    .incbin "walk2.spr"

The C++ equivalent for this would be something like:

char sprite_standing[] = { 0x12, 0x34, 0x56, /* etc... */ };
char sprite_walk1[]    = { 0xab, 0xcd, 0xef, /* etc... */ };
char sprite_walk2[]    = { 0x55, 0x66, 0x77, /* etc... */ };

Basically you're telling the assembler to read in your data, and spit it out directly as an array in your program.

It's pretty rare to see data loaded in that way in a modern C++ program. Instead perhaps you might do something like this:

Sprite *sprite_standing = load_sprite("stand.spr");
Sprite *sprite_walk1 = load_sprite("walk1.spr");
Sprite *sprite_walk2 = load_sprite("walk2.spr");

So what's the difference here? In the C++ version, the data files are assumed to lie outside the program, and each will be loaded separately when asked.

Why was this considered unusable at the time?

Let me give you a brief rundown of loading games on a ZX Spectrum.

The Spectrum has 48KB of memory for use exclusively by the game.

From the point of view of a user wanting to run a game, you'd typically issue the "LOAD" command, wait 4 minutes for the tape to load, and then the game would run. And generally speaking, (ignoring multiload games for this discussion), you'd then stop the tape and play the game.

Tapes aren't random access. The data feeds into the computer in the order it is on the tape. You can't just ask for a specific file to be read, you have to accept whatever data is next up in the queue.

OK, so maybe you could figure out what order you wanted the data in, arrange for it to be on the tape in that order, and then figure out some way of annotating each chunk so that the loader knew what it was and were it needed to be. You'd need to write a special little tool to that.

Oh wait, no you don't. You have one already, it's called the assembler.

Using INCBIN, the assembler automatically places everything where it needs to be, keeps track of a symbol name for each piece of data, and everything can just get loaded as one giant binary blob.

More than tapes

Of course tape-based loading went out of fashion fairly quickly, making way for ROM cartridges (the Genesis and SNES era). And yet the INCBIN approach still works well here.

How do you load a file on the SNES? It's stored on a ROM cartridge. So maybe you could allocate some RAM for it, find it on the cartridge, and then copy it from there into RAM.

But wait - you didn't need to do that - it was already right there in ROM. You don't even need to load it. You can just use it directly in-place from the ROM. So all you really need is some kind of file-system; a table that contains a mapping from each filename string to the address in ROM of the file.

And again here we realize there's already a tool to do that for us; the assembler. We don't need to invent our own string->data mapping code, there's one already in the assembler. It's called the symbol table. We just INCBIN the files, and let the assembler take care of tracking the name for each one.

Today

So this sounds great! We don't need any external files for our game, we can just INCBIN everything and produce one giant executable with everything right there at our fingertips. We don't need to load anything as we have a loader already in the OS!

And yet you'll never see a modern commercial game today using this technique. Why?

Compiler-writers and OS developers broke it.

Firstly you'll notice most high-level languages didn't think to add an INCBIN directive. C/C++ has an #include command, but it can only be used to bring in more C++ source code, not binary files. You can get around that though by writing a small utility to convert your binary files to a char[] array, so it's only a mild annoyance.

That's not the real problem though. A modern game today might use let's say 1GB of data. Can you imagine what would happen if we tried to make a 1GB executable? It'd be a mess.

Firstly you're talking about pumping 1GB of data through the poor linker. Linkers should absolutely be able to handle that. But I wouldn't like to bet any money on it. You'd get 'weird' errors. Segments too big, relocation offsets too big, who knows.

Secondly, even if you did manage to do it, the OS EXE/DLL loader wouldn't be expecting it. Even on 64-bit Windows, you can't make an EXE bigger than 4GB. I can imagine some virus checker kicking in every time you tried to run the game, waiting for minutes while it scanned this giant EXE. Even though there's no difference between data coming from a file and data in an EXE, the virus checker would still want to have it's way first.

Where does this leave us?

You can't ship a large modern game using this method. Not because it wouldn't work on your end, it'd work fine. But because control over loading got taken away from us.

It used to be that your EXE was king, you were given memory space with guaranteed properties, and within that space you controlled the entire system.

Nowadays that memory space isn't really yours any more. Some systems (iOS, Xbox 360, etc) don't allow you to even allocate executable memory areas. Everything has to go through the approval of the OS writers. If you want to do anything differently, you can't. End of story.

We have a compiler. We have a symbol table. This table maps names to addresses! It's everything we need, but we can't use it, purely because loading is outside of our control. We can't even write our own dynamic loader, as they're taking that away too.

The modern method of having to maintain a separate resource manager sucks. It violates the DRY principle, with load management completely separated from the code trying to use it.

I know of few commercial games that manages their data like this. Jak & Daxter on the PS2, due to having it's own programming language, wrote their own dynamic code/data loader (somewhat similar to Unix shared libraries). The PS2 was one of the last consoles where you could get away with this, due to having full control over the machine.

So now we're stuck in a world where we have this system for loading things, but I can't use it because it got too specialized for the mainframe use case, and now I'm not allowed to write my own version either.

Written by Richard Mitton,

software engineer and travelling wizard.

Follow me on twitter: http://twitter.com/grumpygiant