Skip to content

The joy of INCBIN

05-Oct-14

-or- why do we have to load?

 

Why do games have to load data?

At first glance that seems like a stupid question. Of course they have to load data, textures, models, etc. How else would they draw anything?

But games didn’t used to load data files. That’s right, in the early 80’s, calling functions to load your data would have been considered unthinkable.

Many people reading this will be too young to remember the INCBIN command, or one of it’s many variations. Video games in the early 80’s were generally written entirely in assembly language. Most assemblers had a special directive, usually called INCBIN or similar, which would allow you to include any binary file and embed it into your program. I’ll provide a brief example:

The C++ equivalent for this would be something like:

Basically you’re telling the assembler to read in your data, and spit it out directly as an array in your program.

It’s pretty rare to see data loaded in that way in a modern C++ program. Instead perhaps you might do something like this:

So what’s the difference here? In the C++ version, the data files are assumed to lie *outside* the program, and each will be loaded separately when asked.

Why was this considered unusable at the time?

Let me give you a brief rundown of loading games on a ZX Spectrum.

ZX Spectrum memory map.  The Spectrum has 48KB of memory for use by the game.

The Spectrum has 48KB of memory for use exclusively by the game.

From the point of view of a user wanting to run a game, you’d typically issue the “LOAD” command, wait 4 minutes for the tape to load, and then the game would run. And generally speaking, (ignoring multiload games for this discussion), you’d then stop the tape and play the game.

Tapes aren’t random access. The data feeds into the computer in the order it is on the tape. You can’t just ask for a specific file to be read, you have to accept whatever data is next up in the queue.

OK, so maybe you could figure out what order you wanted the data in, arrange for it to be on the tape in that order, and then figure out some way of annotating each chunk so that the loader knew what it was and were it needed to be. You’d need to write a special little tool to that.

Oh wait, no you don’t. You have one already, it’s called the assembler. Using INCBIN, the assembler automatically places everything where it needs to be, keeps track of a symbol name for each piece of data, and everything can just get loaded as one giant binary blob.

More than tapes

Of course tape-based loading went out of fashion fairly quickly, making way for ROM cartridges (the Genesis and SNES era). And yet the INCBIN approach still works well here.

How do you load a file on the SNES? It’s stored on a ROM cartridge. So maybe you could allocate some RAM for it, find it on the cartridge, and then copy it from there into RAM.

But wait – you didn’t need to do that – it was already right there in ROM. You don’t even need to load it. You can just use it directly in-place from the ROM. So all you really need is some kind of file-system; a table that contains a mapping from each filename string to the address in ROM of the file.

And again here we realize there’s already a tool to do that for us; the assembler. We don’t need to invent our own string->data mapping code, there’s one already in the assembler. It’s called the symbol table. We just INCBIN the files, and let the assembler take care of tracking the name for each one.

Today

So this sounds great! We don’t need any external files for our game, we can just INCBIN everything and produce one giant executable with everything right there at our fingertips. We don’t need to load anything as we have a loader already in the OS!

And yet you’ll never see a modern commercial game today using this technique. Why?

Compiler-writers and OS developers broke it.

 

Firstly you’ll notice most high-level languages didn’t think to add an INCBIN directive. C/C++ has an #include command, but it can only be used to bring in more C++ source code, not binary files. You can get around that though by writing a small utility to convert your binary files to a char[] array, so it’s only a mild annoyance.

That’s not the real problem though. A modern game today might use let’s say 1GB of data. Can you imagine what would happen if we tried to make a 1GB executable? It’d be a mess.

Firstly you’re talking about pumping 1GB of data through the poor linker. Linkers should absolutely be able to handle that. But I wouldn’t like to bet any money on it. You’d get ‘weird’ errors. Segments too big, relocation offsets too big, who knows.

Secondly, even if you did manage to do it, the OS EXE/DLL loader wouldn’t be expecting it. Even on 64-bit Windows, you can’t make an EXE bigger than 4GB. I can imagine some virus checker kicking in every time you tried to run the game, waiting for minutes while it scanned this giant EXE. Even though there’s no difference between data coming from a file and data in an EXE, the virus checker would still want to have it’s way first.

Where does this leave us?

You can’t ship a large modern game using this method. Not because it wouldn’t work, it’d work fine. But because control over loading got taken away from us. It used to be that your EXE was king, you were given memory space with guaranteed properties, and within that space you controlled the entire system.

Nowadays that memory space isn’t really yours any more. Some systems (iOS, Xbox 360, etc) don’t allow you to even allocate executable memory areas. Everything has to go through the approval of the OS writers. If you want to do anything differently, you can’t. End of story.

We have a compiler. We have a symbol table. This table maps names to addresses! It’s everything we need, but we can’t use it, purely because loading is outside of our control. We can’t even write our own dynamic loader, as they’re taking that away too.

The modern method of having to maintain a separate resource manager sucks. It violates the DRY principle, with load management completely separated from the code trying to use it.

I know of few commercial games that manages their data like this. Jak & Daxter on the PS2, due to having it’s own programming language, wrote their own dynamic code/data loader (somewhat similar to Unix shared libraries). The PS2 was one of the last consoles where you could get away with this, due to having full control over the machine.

So now we’re stuck in a world where we have this system for loading things, but I can’t use it because it got too specialized for the mainframe use case, and now I’m not allowed to write my own version either.

facebooktwittergoogle_plusreddit

On Building Living Worlds

14-Sep-14

I always found it fascinating how humans seem to have this in-built desire to create new worlds.

For the past 4 years, Miguel Cepero has been running his Procedural World blog, chronicling his exploration of world-building. From a simple voxel landscape, through to L-system grammars for making voxel buildings, it’s certainly an impressive effort, and worth going back over his post history to see it evolve.

No Man's Sky, scheduled for 2015.

No Man’s Sky, scheduled for 2015.

The upcoming game “No Man’s Sky” has been lauded recently for it’s use of procedurally-generated content. It certainly has some damn fine visuals. In this video they talk about how the game uses random seeds to build each planet, and how they use a set of algorithms to place content in the world.

Lords Of Midnight on the ZX Spectrum, 1984

Lords Of Midnight on the ZX Spectrum, 1984

This isn’t by itself a new technique by any stretch. The ZX Spectrum classic “Lords Of Midnight”, (recently remade for iOS and Android with an amazing amount of respect and devotion to the original), is one of the earlier examples of using random numbers and simple algorithms to create a world bigger than the computer could otherwise store.

Mike Singleton crammed an entire world, with enemies, forests, mountains, towers, and more, into only 41KB. For reference, NOTEPAD.EXE is 189KB.

What sets Lords Of Midnight apart from countless other games of the time is it’s complex mythology. To the player, this isn’t just some randomly generated maze or fractal heightfield. The world is inhabited by characters, allies, foes. The back story dictates how these relationships came about, sets the reason for the quest, and lays out which of the multiple paths you’ll take to complete the game.

It’s not just enough to create a landscape. You need something to go in it too. You need a story. Lords Of Midnight, like many others, drew heavily on the works of Tolkien for it’s inspiration.

Many game creators think that a procedural world means generating a random landscape, then sticking random things on it at random. But a world is more than just a collection of things.

You’ll often see fantasy authors get that wrong. Did you ever read a book where they’ll create some character called perhaps “Rag’na K’ptolth”, with apostrophes flung at random throughout like some kind of Photoshop Apostrophe Lens Flare?

Or perhaps you played a game where they created “Tolkeinesque” names by automatically throwing random letter-pairs together? This is the complete opposite to how Tolkien crafted his world. There was nothing random about a single word or language he created.

Tolkien was foremost a professor of languages, and only secondarily an author. Nothing he invented was random. He considered languages inseparable from the mythology associated with them.

He wrote many letters over the years explaining to people parts of his world, and how he went about building it. In his 1955 letter to the Houghton Mifflin Co. (his publishers), he writes:

J. R. R. Tolkien, 1955: (emphasis mine)

All the names in the book, and the languages, are of
course constructed, and not at random…

… what is I think a primary ‘fact’ about my work, that it is all of a piece, and fundamentally linguistic in inspiration. The authorities of the university might well consider it an aberration of an elderly professor of philology to write and publish fairy stories and romances, and call it a ‘hobby’, pardonable because it has been (surprisingly to me as much as to anyone) successful. But it is not a ‘hobby’, in the sense of something quite different from one’s work, taken up as a relief-outlet.

The invention of languages is the foundation. The ‘stories’ were made rather to provide a world for the languages than the reverse. To me a name comes first and the story follows. I should have preferred to write in ‘Elvish’. But, of course, such a work as The Lord of the Rings has been edited and only as much ‘language’ has been left in as I thought would be stomached by readers. (I now find that many would have liked more.) But there is a great deal of linguistic matter (other than actually ‘elvish’ names and words) included or mythologically expressed in the book. It is to me, anyway, largely an essay in ‘linguistic aesthetic’, as I sometimes say to people who ask me ‘what is it all about?’

The way a language evolves over hundreds of years depends on what other cultures you have contact with. Nothing is named in isolation. For example, the town of York in England underwent several changes of name during it’s lifetime:

Originally Eborakon, from when the Briton people inhabited the mainland. Then when the Anglo-Saxons invaded from Northern Europe, it became corrupted into Eoforwic (confusing the words Briton word Ebor with the Anglo-Saxon word Eofor).

In the 9th century, the Danes captured control over the north of England, and as the city fell under Viking rule, the spelling shortened to Jorvik. Over the years since this gradually modernized to York, still in use today.

Tolkien made over 20 different languages during his lifetime, each based on specific races and geographical areas. He created his world specifically to allow somewhere for his languages to evolve in.

One of the few games that seems to use this kind of history-driven approaches to world-building is Dwarf Fortress.

Hundreds of years of history being played out.

Hundreds of years of history being played out.

Dwarf Fortress starts by doing the usual things. A fractal landscape is used as a base. They then add a temperature map, rainfall projection, drainage, vegetation. Erosion is applied and then the world is populated with animals and people.

But what’s somewhat unique is that the game then goes on to simulate an entire cultural history of the world. There’s an absolutely fascinating write-up of it at polygon.com, but basically it simulates the creation of towns, conflicts with neighbors, the rise and fall of civilizations, every week, for 250 years. All before you even think about starting to play the game.

polygon.com

For instance, when you travel to certain cities in the game and speak to a merchant they might tell you that their leather caps are made in an elvish city half a world away. And it will be true. They really were made there, during world creation, and traveled to this market for you to buy before you even started playing.

It seems to me like many of the more successful attempts at world-building follow this basic theme: Invent the culture and situation first, and then discover what kind of world would be born out of that.

I don’t know of any other procedural games that use the same kind of internal depth to world-building. Let us know of any you come across!

facebooktwittergoogle_plusreddit

The Winter Of Our Dis-content

13-Sep-14

I’m going to have to be a little mean here. I try not to be. It’s so easy to be mean and so hard to be constructive, and generally the world is a better place for everyone when people can be constructive.

But I’m afraid I need an example for this post, and so I’m going to have to pick on SourceForge.

SourceForge was basically the first company in the open-source hosting movement, and back in 1999 the novelty of simply having free hosting for a project was a good enough reason to use it.

But let’s have a look the their website now:

sf

Wow. Just wow. Well there’s a lot of adverts, but let’s ignore that. What’s the real problem here?

Where’s the content?

Presumably you arrived at a SourceForge website because perhaps you wanted to:

  1. Read about the project and what it’s supposed to do.
  2. Poke around in the code and see how they do things.
  3. Contribute by reporting bugs or such.

This is the content. The project, the thing the creators have created. This needs to be the foremost thing on the website. And yet the lonely “Description” text gets cramped into a tiny text-box buried underneath a glacier of metadata.

Where’s the code? I mean it’s open-source, right? The clue is in the name, there should be some source here somewhere. That’s the ONE THING the website does is to host source code.

Oh wait, there it is. In case you haven’t spotted it yet, there’s a tiny tiny little “Code” button hidden up on the top-right of the toolbar. OK, so SourceForge has terrible web design. But why does this anger me so much? There’s other source-control websites out there people could be using, sure, but there’s a fundamental problem here that is worth studying.

Our glorious benefactor Gabe

I want to divert briefly to talk about Valve. Over the past 10 years Valve have been one of the companies most famous for trying to innovate in the field of online economics. The release of Steam as a distribution platform, launching Team Fortress 2 as a free-to-play game, the Hat Based Economy, and more.

Their founder Gabe Newell has talked at length on many occasions about what Valve tries to do as a company, why they exist, and how they interact with their customers.

Gabe gave a fascinating talk last year at the Lyndon B. Johnson School, where he describes the process Valve use to empower their customers in creating new value.

That by itself is an unusual idea to some businesses. Often you’ll hear a business talk about increasing “sales”. What Valve have figured out is that sales are simply a subset of value, and that increasing “value” is what you need to be doing.

The first thing that comes across is that Valve really care about what’s best for the customers:

(at 15:45)
Valve is not a publicly traded company. Being a publicly traded company adds a bunch of headaches, and it didn’t really solve any problems for us. It means that control and decision making now involves third-parties.

So for a developer or somebody building something at Valve, it’s like: there’s the customer, and they’re the person that you’re trying to make happy. [...]

You don’t go to board meetings where the board argues about what the third series of venture capitalists are worried about, dilution and hitting certain targets. There’s no notion of a distribution channel who says “we’re really looking for something that fits in this particular slot at this particular time”.

The whole point of being a privately held company is to eliminate another source of noise in the signal between the consumers and producers of a good.

That last line is the key part. The easy interchange of content between the consumer and producer is the single most important thing any company should be worrying about.

(at 33:25)
We’re also seeing this huge uptake in user-generated content.

To be really concrete, 10X as much content comes from the userbase of TF2 as comes from us. So we think that we’re super productive and kinda bad-ass at making TF2 content, but even at this early stage, we cannot compete with our own customers in production of content for this environment.

So the only company we’ve ever met that kicks our ass is our customers. Right we’ll go up against Bungie, or Blizzard, or or… anybody, but we won’t try to compete with our own userbase, because we already know that we’re going to lose.

Once we start to building the interfaces for users to start selling their content to each other, we start to see some surprising things.

An inquiry into value

Now let’s apply some of this advice to SourceForge. In 2008 GitHub launched as another of the many open-source project hosting sites.

Let’s look at a typical GitHub page:

github

Ignoring all the adverts, toolbars, whatever, there is exactly one critical difference between the two. GitHub has the user’s content as the most important thing on the page. Right there, the first thing you see, is the source code. Directly underneath, the user places their documentation, laid out as if it were a website.

The link between the content creator and the content consumer is established directly. The trouble with SourceForge is that it’s under the impression that the content’s metadata is more important than the content.

Gabe mentioned how Steam actually acts as a bottleneck between content creators and consumers, something he’s actively trying to change. This is what SourceForge is – a bottleneck. A bottleneck designed to slow you down and make it harder to get what you want. By removing the barrier between content creators and users, by placing the content centered boldly on the project’s page, GitHub allows their both their customers and users to directly create value. GitHub’s customers will do a better job of marketing the website than GitHub themselves could ever manage.

You are not your customer. Your advertisers and affiliates are not your customer. Your advertisers are there to fund the customer’s needs. For every decision you make you have to stop and ask yourself: Is This Best For The Customer?

I don’t know who SourceForge’s intended customer is, but I’m fairly sure it isn’t supposed to be the people trying to work with projects.

To paraphrase the classic Zen and the Art of Motorcycle Maintenance, neither the producer nor the consumer have value by themselves. Value is what defines the producer and the consumer:

Value can’t be independently related with either the creator or the consumer but could be found only in the relationship of the two with each other. It is the point at which the creator and consumer meet.

If your customers can’t get at your content then your content can’t get at your customers, and you have no customers.

facebooktwittergoogle_plusreddit

A Swift Step Forward

10-Sep-14

I’ve been following the progress of Apple’s new Swift programming language over the past 4 months since it’s announcement.

I have to admit I’m actually quite excited by it. It’s common nowadays for people to fling out new languages left-right-and-center, but most of them tend to just re-hash existing ideas and not provide anything new.

But I think Swift actually has the potential to go somewhere. Let’s break it down a little and see what Swift gets right and wrong.

It looks like C

Or to be more correct, it looks like JavaScript. It used to be that people referred to a language as having a C-like syntax, but you have to move with the times.

It’s a well known fact that every time someone creates a new language, one of two things happen:

  1. They use curly bracket syntax, and the language is a great success.
  2. They use any other syntax, the language is heralded by computer scientists as being the best thing ever created, and then no-one uses it and it dies a slow, painful, and lingering death.

I’m glad to see they made the right choice.

It has native support for lists, strings, and hash tables

Well, that’s not exactly surprising, I mean most languages do now. Python does, Perl does, Lua does, C# does, the list goes on. (C being the elephant in the room here). Even C++ does, although having used Python’s elegant array slicing makes it painful to have to go back to the STL again.

No manual memory management

And there was much rejoicing.

But none of this really answers the question of why Swift would be a better language than, say Python, for writing an application in. Well there’s one extra little feature that almost all modern languages seem to want to avoid nowadays:

It’s statically typed

For those who don’t follow programming-language implementation, this means that the compiler is able to catch an enormous amount of errors before the program has even run.

I’ve never been able to understand the love for dynamically-typed languages. The argument is often made that removing typed variables allows the language to be more expressive, but Swift is able to follow Python’s lead, allowing simple manipulation of types without any syntactic mess:

Because of it’s static typing, that means yes, this is a natively compiled language. No bytecode, no VM. You can ship executables with this.

The only other viable contenders in this field right now are D and Go. D, despite, being a very good effort, has never seemed to quite be able to gain the traction it deserved.

Go however has managed to gain an immediate foothold.

It remains to be seen as to which of the three will win out, or whether a fourth contender will step up. But the future certainly looks very rosy for Swift. Except for one small disadvantage…

It’s Apple only.

Yep, OSX/iOS only right now.

But- I fully expect in time this may well change. Either Apple may choose to open-source their Swift compiler, or failing that an open-source project may step up to fill in the gap.

Seeing as Swift’s main architect is Chris Lattner, who is the creator of LLVM, and is generally considered to be an all-round good guy in terms of open-sourcing things, I personally think there’s a very good chance we’ll see an open-source release from Apple within the next few months.

I certainly hope this happens. Programming has spent too long stuck in the crack between C and JavaScript, and the future is crying out for something with the speed of C++ and the ease of Python.

Is Swift the answer? I don’t know, but I’m excited to find out.

facebooktwittergoogle_plusreddit

Ted Talk A.I.

13-May-14

After listening to the Jeff And Casey Show where they talk about the TED Singularity, I felt society needed someone to solve this problem immediately.

I therefore submit my entry for the TED A.I. XPrize. For convenience, I’ve made it a web app instead of hooking up a Roomba to an iPad, but this can be addressed later.

Here it is, try it out and let me know what y’all think :-)

I eagerly await my $2 million dollar prize.

facebooktwittergoogle_plusreddit