Post-summer Part #1 Optimizing (from 2 seconds to 2 milliseconds)

As I started writing this I noticed it appears like I have a lot to write, so I split it to several categorized parts - I am a programmer after all. But TL;DR: New update coming soon, with a bunch of everything.

I hope you have all had a nice summer so far, it's been too hot for me and not enough time at computer doing fun stuff! Thankfully it's been getting cooler and I've had time to almost complete the next update for BEdit that has taken a bit longer than expected.

Initially I was going to do a quick update for the GUI version only, as one of the BEdit GUI users noted a crash and horrible performance in some of the views due to a large string being displayed. Sadly, I started with the optimizations.

There are several reasons why the tree and by address view is slow (or was if you're coming here after next release!).
1. Iteration scheme (regarding tree view only).
2. Data loading.
3. Clipping.

If you have ever used Dear ImGui, the tree view API looks like:

if (ImGui::TreeNode("Text"))
{
    // Tree is open, populate with stuff.
    ...
    ImGui::TreePop(); // <- call only if TreeNode returned true.
}

BEdit tree API looks like:

1
2
3

TreeNode("Text");
... // Populate with stuff, it's ignored if tree is not open.
TreePop();

The reason why I made the above is so I can iterate a tree structure without any bookkeeping stuff on the usage side. For example:

for (Node* node = myTree; node; node = node->next)
{
    if (node->type == NodeType_Push)
        TreeNode(node->text);
    else if (node->type == NodeType_Pop)
        TreePop();
    else
        Text(node->text);
}

This doesn't require the user of the tree API to keep track of what is open and what is not, something that Dear ImGui does, at the cost of iterating too much. But sadly, that cost was about 10 ms in this case. Still not close to two seconds, but still too much. The problem gets bigger though, the tree can be very large but what tree node do we start rendering from when the view is scrolled down? As this depends on what tree nodes are open and what are not; the only way to know that is to iterate the entire tree until it hits the line number, making the performance worse as you scroll down.

In the end I decided to scrap the tree hierarchy from the UI system, the only place that used it was the tree view (no big surprise there). The tree view now works by taking the tree data, and turning it into an array of text lines. As you expand / retract the tree it simply figures out what the new lines are and does a memmove to make it smaller, and a couple of memcpy to make it larger. This way the view can simply start iterating at the first line that is visible and continue until the last visible line.

The data loading part is trickier. There are several "member editors" in BEdit, one for float members, one for hex, octal, string, enums, you name it! To keep things very simple, I call these editors with an array of bytes being the data to represent and potentially modify. However, BEdit is a file editor, you can add and remove bytes if you so like (currently you can only add and remove bytes through the hex editor). Due to this, the binary file is not stored as a continuous sequence of bytes but rather a binary tree of memory blocks. This means whenever you edit a member, the executable first has to copy the bytes from the block(s), then call the member editor, and if the editor made any changes copy the data back. This works fine for small data, but for large data this copy is too much. As such I made a special case for string:s. If the string is not going to fit the available space in the UI, it only loads as much as can fit and shows "..." at the end, and disables editing. Instead there's a special popup when you press a large string that shows the entire string (assuming UTF-8 or compatible encoding), with scrolling capabilities and all. This popup does not yet have editing capabilities but is planned for a future update.

A funny thing about the renderer in BEdit, it is the only part that handles out of memory conditions (although the long-term plan for BEdit is to be able to handle binary files that has size of terabytes, but it's not yet there). As the above string was huge, the rendering memory required was huge as well. This can be seen in 0.2.1 and earlier versions as the renderer will simply stop rendering triangles after a point, better than crashing I suppose but still far from good. This begs the question, "why aren't the triangles culled before pushed to the render queue"? That was exactly my question as I remembered putting some code just for that, and this is the code I found:

fun B32 PassesCulling(RenderGroup* group, Rect2 drawArea)
{
#if 0
    // NOTE This fights with "relocating" algorithm that's done at end of pass.
    
    Rect2 groupArea = {};
    
    groupArea.dim.w = GetWidth(group);
    groupArea.dim.h = GetHeight(group);
    
    B32 overlaps = Overlaps(groupArea, drawArea);
    return overlaps;
#else
    return true;
#endif
}

(To be frank, I added the comment afterwards.)

The "relocating" algorithm I mention there is referring to a neat little feature. If the UI is allowed to relocate the area it layouts, and that area is partially outside the window, it will relocate it inside the window. This can be seen for many popups in BEdit. For example if you go to Settings and click one of the color settings, you will notice that the color picker is always inside the window, or rather, it was rendered outside but was "relocated" to be within the window. This is also true for mouse overlays. If a part of that popup would've been culled, then after relocating that part wouldn't be visible. The memory issue was fixed by the member editor not loading the entire data and I think all issues related to culling should be solved (in BEdit) by simply not pushing those triangle to the renderer - so this implementation detail stays! (Though I might as well delete the PassesCulling calls.)

Doing these changes caused the frame rate to drop from 2 seconds per frame, to (a capped) 60 FPS, even on debug builds - at least on my computer, and as BEdit only uses (at most) two cores, I suspect on your computer too.

It is a bit odd to call a UI application on a "regular" frame rate, it could just be updated whenever there is a need to redraw or changes have happened. The reason why BEdit GUI gets refreshed on a regular basis is; 1. well... I kinda program games mostly and 2. I plan on having animations for the UI transitions that should get updated at 60 FPS.

With the optimizations done, and also some UI improvements, I decided it was time to tackle the crash issue that I will put in the next post!

Other than the optimizations, crash fix, the next version will add some new features to the language itself.

Post-summer Part #1 Optimizing (from 2 seconds to 2 milliseconds)

Comments