Tutorial - HHA version 0 - Part 1 A Static Language

Sometimes you have a very good idea of what's in a binary file, you might even have the source code for a reader and writer. This tutorial guides you through the steps to write the layout code to view HHA (Handmade Hero Asset) files with BEdit.

This is written for version 0.0.2 of the viewer. To see what version you have you can use bedit.exe -version, if the option isn't recognized the version you have is 0.0.1.

The file format we're investigating here is HHA version 0 and I'm using the handmade_file_formats.h source code from day 221. The asset files for version 0 can be found in handmade_hero_legacy_art.zip. You don't need access to the files to follow this tutorial but if you want to try it out yourself you do.

Follow the law

Some file formats are protected by license. Before viewing or attempting to reverse-engineer a file make sure you have the right to do so.

We will be using handmade_file_formats.h in this wiki entry, that file has a notice (C) Copyright 2015 by Molly Rocket, Inc. All Rights Reserved.

I have written permission by Molly Rocket, Inc to share code from handmade_file_formats.h in this tutorial.

Copy and paste

Let's start by creating hha.bet, this file will contain the definition of the file type written by us in the layout language.

If you have access to the source code of Handmade Hero, you can see that the file format is mostly specified in handmade_file_formats.h. The layout language of BEdit is very similar to C, so it's somewhat easy to just copy-paste what we need directly.

After copy-pasting the entire content of handmade_file_format.h to hha.bet we can run and see what errors we get.

> bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
Invalid tokens in layout file:
1: #if !defined(HANDMADE_FILE_FORMATS_H)
~
75: #define HHA_CODE(a, b, c, d) (((uint32)(a) << 0) | ((uint32)(b) << 8) | ((uint32)(c) << 16) | ((uint32)(d) << 24))
~
77: #pragma pack(push, 1)
~
81: #define HHA_MAGIC_VALUE HHA_CODE('h','h','a','f')
~
84: #define HHA_VERSION 0
~
180: #pragma pack(pop)
~
182: #define HANDMADE_FILE_FORMATS_H
~
183: #endif
~

The number on the left show the line number of the error.

Replace preprocessor directives

BEdit does not have a preprocessor. Replacing the #pragma pack(push, 1) and #pragma pack(pop) is very easy, since all structs are assumed to be tightly packed already we can just delete those lines. The include guard is also not needed so we delete that one too.

If your file format assumes C struct alignment you must manually add padding members when translating the types.

The macro HHA_CODE(a, b, c, d) takes 4 1-byte integers and produces a 4 byte integer. This functionality is inbuilt in BEdit as string literals.

Delete all #define and #pragma instances in the code and add

1
2
3
4
5
enum
{
    HHA_VERSION = 0,
    HHA_MAGIC_VALUE = "hhaf",
};

to the top of the file. Let's run again and see what errors we get.

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
83:     u32 MagicValue;
        ~~~
ERROR Expected type.

Replace scalar types

C scalar types are not supported in BEdit since they have an unknown size and byte order. An unsigned 32-bit integer displayed in decimal is defined using u(4), a signed 64-bit is defined s(8) and a 32-bit float is f(4).

We can at this point go through all instances of u32 and replace them by u(4), but easier (and less typing) is to add typedefs.

1
2
3
4
typedef u(4) u32;
typedef u(8) u64;
typedef f(4) r32;
typedef u(4) bitmap_id; // NOTE(Jens): We could also do `struct bitmap_id { u32 Value; };`

Let's run again and see what we get.

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
177:    union
        ~~~~~
ERROR Expected type.

Replace union

C types are static, BEdit types are not. A struct in BEdit may depend on the contents of the data file. As such, unions are not supported (this may change in the future).

For now, let's replace the union with "untyped" bytes, this is done by defining a scalar with raw.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
struct hha_asset
{
    u64 DataOffset;
    u32 FirstTagIndex;
    u32 OnePastLastTagIndex;

    /* TODO(Jens): We'll soon see how to get this behavior back.
    union
    {
        hha_bitmap Bitmap; // sizeof(hha_bitmap) == 4*4
        hha_sound Sound; // sizeof(hha_sound) == 3*4
        hha_font Font; // sizeof(hha_font) == 5*4
    }; // sizeof union is 20 bytes
    */
    raw(20) DataHeader;
};

Let's run again.

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
WARNING No members exported, this can be due to missing 'layout' or because type had no members.

Specify the layout

Currently we have defined a bunch of types but we haven't told BEdit which type is the one that is the file. We do this with the layout keyword. The file starts with hha_header so in the bottom of the file, add

1
layout hha_header;

and run again.

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
hha_header
 0h     MagicValue 1 717 659 752
04h        Version             0
08h       TagCount           109
0Ch AssetTypeCount            21
10h     AssetCount            55
14h           Tags            44
1Ch     AssetTypes           916
24h         Assets         1 168

The left column indicates the location in the file where the member is, middle is member name and right side is value.

It's not very helpful to see data like MagicValue in decimal form so let's change that.

Tweak scalar display

BEdit types in general have scalar-type, size and radix specifiers. You can also use string for ASCII strings. To see MagicValue as a 4-byte string change it to string(4) MagicValue;, if you want to see it as hexadecimal (in byte order) use raw(4) MagicValue;. Tags, AssetTypes and Assets are locations in the file, since addresses are displayed on left side as hexadecimal let's change the header to

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
struct hha_header
{
    string(4) MagicValue;

    u32 Version;

    u32 TagCount;
    u32 AssetTypeCount;
    u32 AssetCount;

    u(8, hex) Tags; // hha_tag[TagCount]
    u(8, hex) AssetTypes; // hha_asset_type[AssetTypeCount]
    u(8, hex) Assets; // hha_asset[AssetCount]
};

and check the output

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
hha_header
 0h     MagicValue "hhaf"
04h        Version      0
08h       TagCount    109
0Ch AssetTypeCount     21
10h     AssetCount     55
14h           Tags    2Ch
1Ch     AssetTypes  3 94h
24h         Assets  4 90h

We can confirm that the MagicValue and Version is what we expected, but this visual confirmation can only take us so far.

Add asserts

BEdit evaluates the types when it has the data, this enables us to write things like assertions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
struct hha_header
{
    string(4) MagicValue;
    assert(MagicValue == HHA_MAGIC_VALUE);

    u32 Version;
    assert(Version == HHA_VERSION);

    u32 TagCount;
    u32 AssetTypeCount;
    u32 AssetCount;

    u(8, hex) Tags; // hha_tag[TagCount]
    u(8, hex) AssetTypes; // hha_asset_type[AssetTypeCount]
    u(8, hex) Assets; // hha_asset[AssetCount]
};

If we run it again we will get the same result since the assertion is not being triggered, but now if we try to load something from v2_hhas folder we'll get

>bedit.exe -layout hha.bet -data v2_hhas\intro_art_v2.hha
hha_header
 0h MagicValue "hhaf"
04h    Version      2
92:     assert(Version == HHA_VERSION);
        ~~~~~~
Assertion triggered!

Specify entire file

At this point we see the header, but how about the rest of the file? Members in BEdit structs don't have to be next to each other, you can specify the absolute address of them. We can modify the hha_header to include members but I personally prefer to create a new type, let's call it hha_file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
struct hha_file
{
    hha_header Header;

    @(Header.Tags) hha_tag Tags[Header.TagCount];
    @(Header.AssetTypes) hha_asset_type AssetTypes[Header.AssetTypeCount];
    @(Header.Assets) hha_asset Assets[Header.AssetCount];
};

layout hha_file; // Remember to remove `layout hha_header`

The address specifier @(...) specifies the absolute address of the member in the file. If a member does not have an address specifier it starts where the previous member ended.

Now we see (I have removed some entries to make it less verbose)

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
hha_file
hha_header (.Header)
    0h            MagicValue                                     "hhaf"
   04h               Version                                          0
   08h              TagCount                                        109
   0Ch        AssetTypeCount                                         21
   10h            AssetCount                                         55
   14h                  Tags                                        2Ch
   1Ch            AssetTypes                                      3 94h
   24h                Assets                                      4 90h

hha_tag (.Tags[0])
   2Ch                    ID                                          0
   30h                 Value                                   0.000000

hha_tag (.Tags[1])
   34h                    ID                                          5
   38h                 Value                                   1.000000

...

hha_asset_type (.AssetTypes[0])
03 94h                TypeID                                          0
03 98h       FirstAssetIndex                                          0
03 9Ch OnePastLastAssetIndex                                          0

hha_asset_type (.AssetTypes[1])
03 A0h                TypeID                                          0
03 A4h       FirstAssetIndex                                          0
03 A8h OnePastLastAssetIndex                                          0

... (2nd to last entry of AssetTypes has non-zero data)

hha_asset (.Assets[0])
04 90h            DataOffset                                          0
04 98h         FirstTagIndex                                          0
04 9Ch   OnePastLastTagIndex                                          0
04 A0h            DataHeader 0x0000000000000000000000000000000000000000

hha_asset (.Assets[1])
04 B4h            DataOffset                                      3 148
04 BCh         FirstTagIndex                                          1
04 C0h   OnePastLastTagIndex                                          3
04 C4h            DataHeader 0x29090000800400000000003F0000003F00000000

...

If you are having trouble seeing the entire output you can pipe it to file or go to console settings and increase the buffer size.

One thing to notice is Tags.ID is suppose to be an enum, same for AssetTypes.TypeID. In C you can't specify the size of an enum, hence Casey specified them as fixed-width integer types instead. BEdit does support sized enums so let's replace those member types with name_of_enum(size_in_bytes). While we're at it, let's also change hha_asset.DataOffset to be hexadecimal.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
struct hha_tag
{
    asset_tag_id(4) ID;
    r32 Value;
};

struct hha_asset_type
{
    asset_type_id(4) TypeID;
    u32 FirstAssetIndex;
    u32 OnePastLastAssetIndex;
};

...

struct hha_asset
{
    u(8, hex) DataOffset;
    u32 FirstTagIndex;
    u32 OnePastLastTagIndex;
    /* TODO(Jens): We'll soon see how to get this behavior back.
    union
    {
        hha_bitmap Bitmap; // sizeof(hha_bitmap) == 4*4
        hha_sound Sound; // sizeof(hha_sound) == 3*4
        hha_font Font; // sizeof(hha_font) == 5*4
    }; // sizeof union is 20 bytes
    */
    raw(20) DataHeader;
};

Output now looks like

>bedit.exe -layout hha.bet -data v0_hhas\intro_art.hha
hha_file
hha_header (.Header)
    0h            MagicValue                                     "hhaf"
   04h               Version                                          0
   08h              TagCount                                        109
   0Ch        AssetTypeCount                                         21
   10h            AssetCount                                         55
   14h                  Tags                                        2Ch
   1Ch            AssetTypes                                      3 94h
   24h                Assets                                      4 90h

hha_tag (.Tags[0])
   2Ch                    ID                             Tag_Smoothness
   30h                 Value                                   0.000000

...

hha_asset_type (.AssetTypes[0])
03 94h                TypeID                                 Asset_None
03 98h       FirstAssetIndex                                          0
03 9Ch OnePastLastAssetIndex                                          0

...

hha_asset (.Assets[0])
04 90h            DataOffset                                         0h
04 98h         FirstTagIndex                                          0
04 9Ch   OnePastLastTagIndex                                          0
04 A0h            DataHeader 0x0000000000000000000000000000000000000000

hha_asset (.Assets[1])
04 B4h            DataOffset                                      C 4Ch
04 BCh         FirstTagIndex                                          1
04 C0h   OnePastLastTagIndex                                          3
04 C4h            DataHeader 0x29090000800400000000003F0000003F00000000

...

Conclusion

Wait what? It ends now? But we're just getting started!

We have just gotten started and we'll continue in the next part! But at this point you can already specify a lot of different file formats, download the command line reader if you haven't already and play around with it.

Key take-aways from this wiki entry is, when you have a file format already in C:

  • Follow the law
  • Copy and paste structs and enums to bet-file
  • Replace preprocessor directives
  • Replace scalar types
  • Replace any unions (more on this next part)
  • Specify the layout
  • Tweak scalar display
  • Add asserts (can't hurt at least)
  • Specify the entire file

In the next part we'll figure out how to deal with that union, and we'll see how dynamic BEdit layout language actual is.


Edited by Jens on