Sometimes you have a very good idea of what's in a binary file, you might even have the source code for a reader and writer. This tutorial guides you through the steps to write the layout code to view HHA (Handmade Hero Asset) files with BEdit.
This is written for version 0.0.2 of the viewer. To see what version you have you can use bedit.exe -version
, if the option isn't recognized the version you have is 0.0.1.
The file format we're investigating here is HHA version 0 and I'm using the handmade_file_formats.h
source code from day 221. The asset files for version 0 can be found in handmade_hero_legacy_art.zip
. You don't need access to the files to follow this tutorial but if you want to try it out yourself you do.
Some file formats are protected by license. Before viewing or attempting to reverse-engineer a file make sure you have the right to do so.
We will be using handmade_file_formats.h
in this wiki entry, that file has a notice (C) Copyright 2015 by Molly Rocket, Inc. All Rights Reserved.
I have written permission by Molly Rocket, Inc to share code from handmade_file_formats.h
in this tutorial.
Let's start by creating hha.bet
, this file will contain the definition of the file type written by us in the layout language.
If you have access to the source code of Handmade Hero, you can see that the file format is mostly specified in handmade_file_formats.h
. The layout language of BEdit is very similar to C, so it's somewhat easy to just copy-paste what we need directly.
After copy-pasting the entire content of handmade_file_format.h
to hha.bet
we can run and see what errors we get.
> bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha Invalid tokens in layout file: 1: #if !defined(HANDMADE_FILE_FORMATS_H) ~ 75: #define HHA_CODE(a, b, c, d) (((uint32)(a) << 0) | ((uint32)(b) << 8) | ((uint32)(c) << 16) | ((uint32)(d) << 24)) ~ 77: #pragma pack(push, 1) ~ 81: #define HHA_MAGIC_VALUE HHA_CODE('h','h','a','f') ~ 84: #define HHA_VERSION 0 ~ 180: #pragma pack(pop) ~ 182: #define HANDMADE_FILE_FORMATS_H ~ 183: #endif ~
The number on the left show the line number of the error.
BEdit does not have a preprocessor. Replacing the #pragma pack(push, 1)
and #pragma pack(pop)
is very easy, since all structs are assumed to be tightly packed already we can just delete those lines. The include guard is also not needed so we delete that one too.
If your file format assumes C struct alignment you must manually add padding members when translating the types.
The macro HHA_CODE(a, b, c, d)
takes 4 1-byte integers and produces a 4 byte integer. This functionality is inbuilt in BEdit as string literals.
Delete all #define
and #pragma
instances in the code and add
1 2 3 4 5 | enum { HHA_VERSION = 0, HHA_MAGIC_VALUE = "hhaf", }; |
to the top of the file. Let's run again and see what errors we get.
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha 83: u32 MagicValue; ~~~ ERROR Expected type.
C scalar types are not supported in BEdit since they have an unknown size and byte order. An unsigned 32-bit integer displayed in decimal is defined using u(4)
, a signed 64-bit is defined s(8)
and a 32-bit float is f(4)
.
We can at this point go through all instances of u32
and replace them by u(4)
, but easier (and less typing) is to add typedefs.
1 2 3 4 | typedef u(4) u32; typedef u(8) u64; typedef f(4) r32; typedef u(4) bitmap_id; // NOTE(Jens): We could also do `struct bitmap_id { u32 Value; };` |
Let's run again and see what we get.
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha 177: union ~~~~~ ERROR Expected type.
C types are static, BEdit types are not. A struct in BEdit may depend on the contents of the data file. As such, unions are not supported (this may change in the future).
For now, let's replace the union with "untyped" bytes, this is done by defining a scalar with raw.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | struct hha_asset { u64 DataOffset; u32 FirstTagIndex; u32 OnePastLastTagIndex; /* TODO(Jens): We'll soon see how to get this behavior back. union { hha_bitmap Bitmap; // sizeof(hha_bitmap) == 4*4 hha_sound Sound; // sizeof(hha_sound) == 3*4 hha_font Font; // sizeof(hha_font) == 5*4 }; // sizeof union is 20 bytes */ raw(20) DataHeader; }; |
Let's run again.
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha WARNING No members exported, this can be due to missing 'layout' or because type had no members.
Currently we have defined a bunch of types but we haven't told BEdit which type is the one that is the file. We do this with the layout
keyword. The file starts with hha_header
so in the bottom of the file, add
1 | layout hha_header; |
and run again.
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha hha_header 0h MagicValue 1 717 659 752 04h Version 0 08h TagCount 109 0Ch AssetTypeCount 21 10h AssetCount 55 14h Tags 44 1Ch AssetTypes 916 24h Assets 1 168
The left column indicates the location in the file where the member is, middle is member name and right side is value.
It's not very helpful to see data like MagicValue
in decimal form so let's change that.
BEdit types in general have scalar-type, size and radix specifiers. You can also use string
for ASCII strings. To see MagicValue
as a 4-byte string change it to string(4) MagicValue;
, if you want to see it as hexadecimal (in byte order) use raw(4) MagicValue;
. Tags
, AssetTypes
and Assets
are locations in the file, since addresses are displayed on left side as hexadecimal let's change the header to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | struct hha_header { string(4) MagicValue; u32 Version; u32 TagCount; u32 AssetTypeCount; u32 AssetCount; u(8, hex) Tags; // hha_tag[TagCount] u(8, hex) AssetTypes; // hha_asset_type[AssetTypeCount] u(8, hex) Assets; // hha_asset[AssetCount] }; |
and check the output
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha hha_header 0h MagicValue "hhaf" 04h Version 0 08h TagCount 109 0Ch AssetTypeCount 21 10h AssetCount 55 14h Tags 2Ch 1Ch AssetTypes 3 94h 24h Assets 4 90h
We can confirm that the MagicValue
and Version
is what we expected, but this visual confirmation can only take us so far.
BEdit evaluates the types when it has the data, this enables us to write things like assertions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | struct hha_header { string(4) MagicValue; assert(MagicValue == HHA_MAGIC_VALUE); u32 Version; assert(Version == HHA_VERSION); u32 TagCount; u32 AssetTypeCount; u32 AssetCount; u(8, hex) Tags; // hha_tag[TagCount] u(8, hex) AssetTypes; // hha_asset_type[AssetTypeCount] u(8, hex) Assets; // hha_asset[AssetCount] }; |
If we run it again we will get the same result since the assertion is not being triggered, but now if we try to load something from v2_hhas
folder we'll get
>bedit.exe -layout hha.bet -data v2_hhas\intro_art_v2.hha hha_header 0h MagicValue "hhaf" 04h Version 2 92: assert(Version == HHA_VERSION); ~~~~~~ Assertion triggered!
At this point we see the header, but how about the rest of the file? Members in BEdit structs don't have to be next to each other, you can specify the absolute address of them. We can modify the hha_header
to include members but I personally prefer to create a new type, let's call it hha_file
1 2 3 4 5 6 7 8 9 10 | struct hha_file { hha_header Header; @(Header.Tags) hha_tag Tags[Header.TagCount]; @(Header.AssetTypes) hha_asset_type AssetTypes[Header.AssetTypeCount]; @(Header.Assets) hha_asset Assets[Header.AssetCount]; }; layout hha_file; // Remember to remove `layout hha_header` |
The address specifier @(...)
specifies the absolute address of the member in the file. If a member does not have an address specifier it starts where the previous member ended.
Now we see (I have removed some entries to make it less verbose)
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha hha_file hha_header (.Header) 0h MagicValue "hhaf" 04h Version 0 08h TagCount 109 0Ch AssetTypeCount 21 10h AssetCount 55 14h Tags 2Ch 1Ch AssetTypes 3 94h 24h Assets 4 90h hha_tag (.Tags[0]) 2Ch ID 0 30h Value 0.000000 hha_tag (.Tags[1]) 34h ID 5 38h Value 1.000000 ... hha_asset_type (.AssetTypes[0]) 03 94h TypeID 0 03 98h FirstAssetIndex 0 03 9Ch OnePastLastAssetIndex 0 hha_asset_type (.AssetTypes[1]) 03 A0h TypeID 0 03 A4h FirstAssetIndex 0 03 A8h OnePastLastAssetIndex 0 ... (2nd to last entry of AssetTypes has non-zero data) hha_asset (.Assets[0]) 04 90h DataOffset 0 04 98h FirstTagIndex 0 04 9Ch OnePastLastTagIndex 0 04 A0h DataHeader 0x0000000000000000000000000000000000000000 hha_asset (.Assets[1]) 04 B4h DataOffset 3 148 04 BCh FirstTagIndex 1 04 C0h OnePastLastTagIndex 3 04 C4h DataHeader 0x29090000800400000000003F0000003F00000000 ...
If you are having trouble seeing the entire output you can pipe it to file or go to console settings and increase the buffer size.
One thing to notice is Tags.ID
is suppose to be an enum, same for AssetTypes.TypeID
. In C you can't specify the size of an enum, hence Casey specified them as fixed-width integer types instead. BEdit does support sized enums so let's replace those member types with name_of_enum(size_in_bytes)
. While we're at it, let's also change hha_asset.DataOffset
to be hexadecimal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | struct hha_tag { asset_tag_id(4) ID; r32 Value; }; struct hha_asset_type { asset_type_id(4) TypeID; u32 FirstAssetIndex; u32 OnePastLastAssetIndex; }; ... struct hha_asset { u(8, hex) DataOffset; u32 FirstTagIndex; u32 OnePastLastTagIndex; /* TODO(Jens): We'll soon see how to get this behavior back. union { hha_bitmap Bitmap; // sizeof(hha_bitmap) == 4*4 hha_sound Sound; // sizeof(hha_sound) == 3*4 hha_font Font; // sizeof(hha_font) == 5*4 }; // sizeof union is 20 bytes */ raw(20) DataHeader; }; |
Output now looks like
>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha hha_file hha_header (.Header) 0h MagicValue "hhaf" 04h Version 0 08h TagCount 109 0Ch AssetTypeCount 21 10h AssetCount 55 14h Tags 2Ch 1Ch AssetTypes 3 94h 24h Assets 4 90h hha_tag (.Tags[0]) 2Ch ID Tag_Smoothness 30h Value 0.000000 ... hha_asset_type (.AssetTypes[0]) 03 94h TypeID Asset_None 03 98h FirstAssetIndex 0 03 9Ch OnePastLastAssetIndex 0 ... hha_asset (.Assets[0]) 04 90h DataOffset 0h 04 98h FirstTagIndex 0 04 9Ch OnePastLastTagIndex 0 04 A0h DataHeader 0x0000000000000000000000000000000000000000 hha_asset (.Assets[1]) 04 B4h DataOffset C 4Ch 04 BCh FirstTagIndex 1 04 C0h OnePastLastTagIndex 3 04 C4h DataHeader 0x29090000800400000000003F0000003F00000000 ...
Wait what? It ends now? But we're just getting started!
We have just gotten started and we'll continue in the next part! But at this point you can already specify a lot of different file formats, download the command line reader if you haven't already and play around with it.
Key take-aways from this wiki entry is, when you have a file format already in C:
In the next part we'll figure out how to deal with that union, and we'll see how dynamic BEdit layout language actual is.