codersnotes

Disassembling Jak & Daxter January 12th, 2017

Disclaimer

I don't work at Naughty Dog, and I don't have any secret knowledge of Jak & Daxter, except what I figured out myself from the disc. So a lot of this may well be wrong. Take a pinch of salt with it.

The massive world of Jak & Daxter

I've always had a fascination with Jak & Daxter ever since it came out, way back in 2001 on the PlayStation 2. It was one of the first 3D games I'd actually seen run at 60Hz. I was used to PC games like Quake which were software rendered, or games like Mario 64 which I played (and completed!) on UltraHLE at a woefully poor framerate. But Jak & Daxter, hereafter referred to as J&D, blew me away.

Here was a game running at a silky-smooth 60Hz, never missing a beat, and somehow managing to draw more polygons than any other game out there. A game where you could see and explore a vast world with no boundaries. No load screens, no pauses, no LOD pops. It all just fit together perfectly. I had to know more.

The J&D engine may possibly be one of the greatest engines ever made. Of course, because it's not open source, it doesn't get the recognition that more famous engines like Quake get. There's not much information out there about it, but there's a few snippets here and there. The most important for you to read would be Stephen White's 2003 GDC talk, "The Technology Of Jak & Daxter", which I can't find a link to any more but I've mirrored here:

There's a lot to talk about, but today I'm going to ignore all the cool graphics tech and such, because I want to focus on the most amazing element: GOAL

GOAL, standing for "Game Oriented Assembly Lisp", is the programming language J&D is written in. All of it. I want to really nail this point home here -- some people think it's some kind of bytecoded scripting language. It isn't. GOAL is a fully natively-compiled general purpose programming language, and the whole game is written in it.

It's a hell of a thing, when you think about it. To create your own programming language, from scratch, and then write the entire game engine in it. I don't know of any other games out there that do that.

The only part of the game not written in GOAL is the loader/linker, which is written in C++. This is the equivalent of their DLL loader; it's just a simple stub program to load the rest. And this is what this article will be picking apart, as it forms the gateway to everything else.

Because GOAL has its origins in Lisp, one of it's hallmarks is that symbols are considered a first-class datatype. There is no offline linker, everything is dynamically linked. Therefore the symbol table must be present in the runtime data. What this means is that if we can understand the data format, we can get a complete symbolic disassembly of the whole game.

If you want to follow along at home, you'll need a copy of the PS2 J&D disc or ISO, and the source code I've posted at the end here.

The loader

Let's start at the loader. This is the first thing that loads, and is just a regular PS2 ELF compiled with gcc. For whatever reason (lack of attention possibly), Naughty Dog forgot to strip the ELF symbols from this file, which means we can open it up in any MIPS disassembler and follow its logic through.

The function of the loader is simple. It sits waiting, and when new data arrives from wherever, it allocates space on the heap and copies the data there. It then relocates the pointers to the new address, and patches up any references to the global symbol table. Once this is done the data is ready to go. That's basically it; there's no VM or interpreter here.

The loader generally tracks 3 different memory heaps -- the common heap, and two level heaps. The game has two levels loaded at any one time, each getting their own self-contained heap (this is what allows you to seamlessly walk from one to another). The common heap contains the core engine code and data. The loader just uses a simple bump allocator to throw new data on the end. Once the level is finished with, it just throws out the entire heap and starts again.

So where does the loader get its data from? First we need to examine the file formats used here. If you look on the disc, the main files used here are called "CGO" and "DGO". There's no actual difference between them, I presume the names just reflect the usage (code/data?). It tends to be that CGO files are loaded into the common heap, and DGO files are loaded into the level heaps. However, don't be confused by the names -- just because they imply code and data separately, the contents are in fact mixed and can contain both code and data interchangeably. In fact, as we'll see later, GOAL doesn't really distinguish the two concepts.

The DGO format is real simple. There's a simple header, and then it's just a bunch of binary files concatenated together. It's closer to a ZIP archive than it is to something like ELF. The DGO is just a way of delivering binary blobs one at a time.

In the retail version, the IOP (the PS2s input/output processor) delivers the DGO file as-is into the loader when requested. In the development builds however, they also listen on a network port for incoming binaries. This is how their development process works -- they can compile new source files on their development machine and stream them straight into the PS2 while it's running. The target PS2 doesn't even know where the files came from, it just loads them in like any other.

It's not really fair to call this "hotloading", it's really just "loading", as updating the game with new code/data is no different than loading it in the first place. This is made possible by the dynamic symbol table in the linker, which we'll look at later in the top-level section discussion.

So, we can easily write a little program to pop open a DGO and extract the files within. They don't have an official extension but I'll refer to them as ".go.bin", assuming they were compiled from a ".go" originally. But what do these binaries exactly contain?

A .go.bin is basically the equivalent of a .OBJ file from a normal C++ compiler. It contains 3 sections, as well as relocation tables to patch them with:

These are loaded in-place by the loader, which makes loading very efficient. The loader does not have to actually parse the data or even know what it is. As far as it's concerned, we're just loading 3 big chunks of data in and then patching them up afterwards.

The top-level segment

The top-level segment contains everything at global scope from the source file. Because this only executes once, the loader loads it, runs it, and then throws it straight away. Its main purpose is to register functions into the global symbol table. Because functions are first-class objects in GOAL (like most other things), they're just function pointers. So registering a function is simply a matter of storing a pointer to it in the symbol table.

This means that every single function call in J&D goes through a pointer (i.e. is a virtual function). People always used to tell me this was bad on the PS2 as, oh I don't know, DCACHE misses or something. Well seeing as J&D is one of the best performing games on the system (or any system), it doesn't seem to have done them any harm.

Here's an example excerpt of a top-level segment:

; eye

;==================================================================
    .segment top-level-segment
;==================================================================


;------------------------------------------------------------------
    .type function
goal-entry:
    lui v1, L35
    or v1, v1, L35
    sw v1, *eye-work*(s7)
    lui v1, L26
    or v1, v1, L26
    sw v1, render-eyes(s7)
    lui v1, L12
    or v1, v1, L12
    sw v1, update-eyes(s7)
    lui v1, L11
    or v1, v1, L11
    sw v1, get-eye-block(s7)
    lui v1, L10
    or v1, v1, L10
    sw v1, convert-eye-data(s7)
    lui v1, L1
    or v0, v1, L1
    sw v0, merc-eye-anim(s7)
    jr ra
    daddu sp, sp, r0

In this example, what we're doing is loading the address of a function (e.g. render-eyes) and storing it at an offset into the global symbol table (register s7). Note that GOAL symbol names can happily contain dashes, asterisks, and other Lispisms. GOAL keeps a pointer to the symbol table in register S7 at all times, and all global variables/functions are always referenced via this.

So when this file is loaded into the system, it simply registers the existence of 6 named global objects, and then returns.

I'll give you a brief example of how something like this might be generated. Imagine the following C function:

int my_function(int x, int y) { return x+y; }

In GOAL, you might write this in a more Lispy syntax: (I guessed at the syntax from snippets I've seen online, don't peek too closely)

(defun my_function ((x int) (y int))
    (return (+ x y)))

Looks similar on first inspection, but the key difference is defun (short in Lisp for "define function"). This is a macro. We're not actually defining the function ourselves here, what we're doing is passing our code into the "define function" macro. The macro will run at compile time and rewrite our code, and we'll end up with something like this:

// top-level segment:
symbols["my_function"] = &L1; // assign fn ptr to symbol
// main segment:
int L1(int x, int y) { return x+y; }

So effectively, the top-level segment is just doing at runtime the work that most compilers do at compile time. Instead of writing symbol tables into an ELF file, we set it up ourselves upon loading.

The main segment

The main segment contains the real work. Because this is just a binary chunk, it could contain anything. Typically it tends to be either a set of function bodies, or some data structures containing art data. Either way, the top-level segment will refer to these objects and patch them into the running system.

For me, this is one of the most beautiful aspects of the GOAL system; the mixing of code and data. The system just doesn't care. There's no code in GOAL to, for example, parse an XML file into a structure. It's just not needed. If you want to fill in data, you just embed that data right into the binary. I wrote about this way back in my old article, The Joy Of INCBIN. GOAL takes this idea to its extreme. Textures, models, etc, are all processed by offline tools which write out .go files, which the compiler then just packages up like any other data set.

As far as GOAL is concerned, a function body is just an object like any other, no different from an array or a struct. Each allocated GOAL object has a 4-byte header in front of it which contains a pointer to its type, used for both debugging and runtime polymorphism. Note that I didn't say "type debugging information", I said "type". This is because in GOAL, a type is itself an object as well. You can even create new types at runtime if you like.

The debug segment

Lastly we have the debug segment. The game doesn't normally load this in a retail build, it just skips over it. During development however, it contains assorted extra information used for debugging. In a regular ELF binary, you might expect to find encoded DWARF data or some other format that described all the symbols and types within. Well that's not how GOAL works.

GOAL can't list the symbols within the file because there aren't any. Remember, we patched the symbols into the system ourselves in the top-level segment? The code itself decides what to assign to what symbol at runtime, not compile time.

It also can't list the type information, because types are a purely runtime construct too. That's not to say that this is a dynamically-typed language like Lisp or Python, because it isn't. The compiler emits static native code assuming the layout of types, just like a C++ compiler would. The key to understanding GOAL is to realize that types are just something the runtime lets you create at will, and the compiler is just making use of that facility in advance.

So if there's no type info here and no symbol table here, what is here? Well, it tends to just contain a bunch of functions (probably compiler-generated) that know how to print out and inspect the contents of the types the compiler made.

GOAL never had a source-line debugger, as far as I know, although there's no reason it couldn't given some effort to write one. (edit: wrong, see here) So it seems like the main mechanism Naughty Dog used to debug with was to pull up a console and enter commands to print and inspect various data themselves. (Remember, the system can compile a new single-line function and download it straight into the loader at any time! This gives you a free debug console.)

That's not actually as bad as it sounds; bear in mind this is a fully live-editable system, so if there's a function you want to investigate you can easily just patch in a printf while the game's running and mess around with it.

Getting a disassembly

So that's pretty much all there is to loading compiled GO files. You can disassemble the ELF loader to figure out the exact file format, and what with having the ELF symbols for the loader available, it's not that hard to replicate the function of the loader ourselves.

So I did.

Below is a small program that loads DGOs, disassembles them back into MIPS assembly, and writes it all out. The assembly format isn't quite standard, as there's a few things the .go.bin format needs that a regular assembler wouldn't provide. In particular it would need to understand how to write the symbol names used into the relocation table, and how to prefix each stored object with a pointer to its type.

I suspect most people won't be that interested to actually pick through it all themselves, but still, here it is. Let me know if you find any of it interesting!

Written by Richard Mitton,

software engineer and travelling wizard.

Follow me on twitter: http://twitter.com/grumpygiant