With the optimizations done "good enough for now", I moved on to fix the crash.

Now, I did test BEdit GUI application both on Linux, and Windows 10 - but I didn't test it on Windows 7, an OS that BEdit is expected to run on. So getting a crash report was sad and, as I had done no changes in the OS abstractions layer since previous update, also surprising.

Thankfully I had a minidump (very appreciated, if you have a crash please send a minidump to [email protected]) so I could get some information of where it was... sadly, I forgot to save the debug symbols. If you haven't seen a minidump from Windows before, you can open it up with Visual Studio and it will display the stack trace and (mostly) everything you expect to be available when you normally debug - assuming you have the debug symbols. So here is a lesson I will at least teach myself: Save the pdb-files after making a release! The sad part is that this is a lesson I've already taught myself in another project, I just haven't had the time to port the work to BEdit.

So what do you do if you don't have the debug symbols?

Thankfully the difference between different builds of the same source code is not that big. Combine this with the fact that the minidump contains the actual assembly that was getting executed, you can do some "pattern matching" to figure out where the crash actually was. I built BEdit (after checking out the release commit) and started debugging on release mode. Now you can compare the instructions of the released binary with the one you built locally. Based on the stack trace and instructions generated around the crash I found where the issues was; loading the path of the layout file while drawing a UTF-8 string. I also learnt that Visual Studio compiler doesn't inline as much as I'd hoped for, but maybe that is for a good reason.

The crash had to do with reloading the layout text file, something I had tested quite a lot, and after testing it again I could indeed not reproduce the issue. The OS layer does the conversions between wchar_t* and UTF-8 for paths, so I figured Well, the user has a different operating system, and this does deal with OS API, that must be it!

At this point I install VirtualBox and Windows 7, that sadly doesn't come with OpenGL 3 support out-of-the-box. So now add an OpenGL 3 dll that emulates by using a software renderer... As if VirtualBox didn't make the performance slow enough, that did! I run BEdit, load the exact same binary file and exact same layout file - no crash yet. I add a space to the text file (to force reloading) and... no crash. I do it again, no crash. At this point I decide, "obviously it's a UTF-8 issue, not a reload notification issue" so I rename the folder where I store the stuff in to "nån-ÄSCII-pöth" and try again. Then I quickly add another character to force reloading and... CRASH! It's regarding text encoding! I start looking at the OS calls to convert wchar_t* to UTF-8 and... I get confused. The crash was regarding displaying a text string (the path to the file), but the debugger said it was unable to load memory - sounds more like reading deallocated memory to me. So, I do what I always do: the exact same thing as before and expect the exact same outcome (the reason why I do it is that I don't really expect the same outcome). I go to Windows 7 VirtualBox, go back to the text file, add a character, save it aaaaand.... no crash.

Odd, the end-user was able to reproduce it 100%. You do A and B happens, then you do A and B doesn't happen... Well, if it happened once, just do it again? So I repeat the process, once, twice, and on the third time it reproduced the crash. I've been programming enough to know what causes that pattern: timing issues.

There is one thing and one thing only that spawns a thread in BEdit, that is evaluating the instructions of the layout file. Combine this with: if the layout file has been modified, reload it - and you'll quickly discover the reason. Concurrency. Now I tested reloading of the layout file many times, but I never tested it "fast enough", and had I not started with the optimization pass this would have been a trivial bug to found. The only reason I found it was because the VirtualBox Windows 7 with software OpenGL rendering was so slow! (Well, to be honest the only reason I found the bug was because a user reported it.)

The way to reproduce it 100%, independent on OS, was to add a Sleep(100) in the thread loop. The actual fix was 3 lines of code, just clear the memory if thread is active.

With optimizations done, and the crash fixed, I could've stopped there. But, I don't like making updates without new features when there are many features to implement - so I started implementing new features. As usual I put the new features on GitHub "issues" and managed to add enough to make the list two pages long, a new record! (And hopefully the biggest record.) I'll present these feature in the next part.