When looking at the assembly of an unknown binary you are usually welcomed with a lot of information. Even a small binary of a couple of kilobytes contains lots and lots of instructions itself, and likely relies on external information (DLLs, assets, operating system calls, etc.). The obvious question is: Where do I start?
Where to start
For reverse engineering I rely mostly on Ghidra. It comes along with a really good set of features and its UI is great even for beginners. In my specific example I wanted to understand a file called SPELL.BIN from Spellbinder: The Nexus Conflict Demo. This file contains the spells, effect, runes used by the game logic. The file itself is just the assets, and some DLL or EXE must be parsing it. So let’s start with identifying our target: Where is the code probably located?
Locating your binary
In case of the Spellbinder Demo we have 2 relevant files:
The first one is just about 90 KB, the later about 1.7 MB. As the demo already came with an auto updater, I took the assumption that all of the game logic must be in game.dll, so it can be replaced with the assets. Also spell.exe does not reference the game.dll via the linker, so it can perform tasks prior to loading the code in game.dll.
Importing your binary
The first important step is to import your binary properly. Try to provide all the DLLs referenced by the target available in the library paths. An example for game.dll can be seen in the screen below:
Using this you quickly gain insights on what third party components your target relies on. In the example seen here you can see it relies on DDRAW.DLL, which quickly indicates it likely uses Direct Draw for rendering. So if you are interested in the rendering stack, you’d know what to look for.
Analyzing the binary
When opening the CodeBrowser on your target for the first time, Ghidra will ask you to perform an analysis for you. Do so! In case you suspect your target uses C++, I highly recommend to install [Ghidra-Cpp-Class-Analyzer](https://github.com:astrelsky/Ghidra-Cpp-Class-Analyzer] first. Select whichever analyzers you prefer. I usually go with all of them, except for “Agressive Instruction Finder” and “MS DIA-SDK”.
Scoping your analysis
Now you are at the point of manual labour. The most important thing is to keep the scope down as much as possible. You will likely never understand everything that happens inside binary, but the good thing is: you don’t have to. We are interested in reading spell.bin, so likely the filename occures in the code. Searching for it we found it as string constant:
The references all pass it into FUN_0045ede0, which seems to be the method to load Spellbinders BIN files. Let’s rename this method to LoadSpellBinFile. All occurences get passed the same constant string, so even if it could do more, it’s only used to read the spell.bin file. Intrestingly I found an additional method FUN_004571d0 which gets passed a spell.dat, which seems to be the same in a different format.
In our method LoadSpellBinFile we see lots of local data, as to be seen in the following screenshot:
Ghidra is able to generate datastructures automatically, but for now we’ll do it manually. The first datastructure being used is at puVar2, which is the same as ppuVar11, which is the same as ppuVar6. And where does this come from?
We see a method getting passed 344 and returning a pointer, likely an allocation of 344 bytes. This is awesome: When looking at spell.bin we see a a nice pattern of 344 bytes in size. So what happens to these allocated bytes? When the allocation worked, the pointer gets assigned PTR_LAB_0056f458. What is that supposed to be? Likely a virtual function table. So this might be object oriented code after all, which is either not detected as C++ or some other object oriented framework on top of C.
Vtables - Virtual function tables
So we’ve figured that PTR_LAB_0056f458 is a Vtable. Now we need to assign a proper type to it:
Once this is in place (you need to create the necessary function definitions for it), the memory at 0056f458 will look like this:
Using your newly created Vtbl you can also create a matching structure to hold it:
Unfortunately the code of LoadSpellBinFile does not reveal what individual fields of our newly created struct mean. We can see that it is likely chunks of 4 bytes, but that’s pretty much it.
Stepping back instead of diving to deep
Do you remember that FUN_004571d0 working in a similar area, likely reading the spells in a different format? It’s only called once so I renamed it to LoadSpellDatFile. Opening this method shows it is working with the same structures as LoadSpellBinFile. And it’s using lots of string constants:
- “Loading Spell #%d\n”
- “Error Loading ‘type’ from section ‘%s’ file: %s\n”
This immediately gains us more insights. Once we told Ghidra piVar5 is a pointer to the 344 bytes large structure, it will show us that field 0x1c is fatigue. The format of spell.dat seems to be an .ini based format.
Now the real gruntwork happens, to name each field and do the same steps as above for other data structures.
If your interest is reading files, look for methods specific to it. E.g. to read files you need to open them first. On WIN32 based applications you’d usually use OpenFile for this. So browsing the imports we find the following:
Using the references to the function you will find the places that, well, open a file.
In this specific example there are two occurences, both in FUN_00516b60. After renaming local variables to get a better understanding of the code, this is what the code looks like:
What can be seen is that it tries to read at least 768 bytes, and if that is not the case it will return. So we figured that _read is used to load data. We also learned that FUN_00535f80 allocated memory (or provides access to already allocated memory). As the memory is not released in the method, the method FUN_00535f80 likely keeps track of it.
It is also worth noticing that there are other methods to read files like ReadFile.
Another thing to keep in mind is not everything is statically or dynamically linked. Sometimes DLLs are loaded in program code. You should always check for references to GetProcAddress. This gains insights on the actual methods being hidden behind pointers. In the example of game.dll all socket related methods can be provided by two different DLLs, and the code loads them as it prefers. So all network related code is hard to find until you assign a proper datatype and a proper name to the function pointers. In the example below closesocket has been properly assigned, while WSAAsyncSelect is not yet.