Alex de Sousa
Posted on April 22, 2020
Some bugs pull you to the abyss and leave you there. They're the Balrog to our Gandalf. Once you beat them though, you're wiser and more powerful!
I was building a small plugin in C++ for an MT4 Server (ForEx trading server). The output of the project was a Windows DLL. Using the server's protocol, I managed to get JSON strings and parse them with RapidJSON. Everything ran smoothly in my virtual machine and in some development servers. Even Valgrind couldn't find memory leaks. I thought the plugin was ready for production...
Oh boy was I so wrong!
Note: RapidJSON is an amazing library and if I need to parse JSON in C/C++ again, I would use it without hesitation.
Debugging The Problem
Once I deployed the plugin, everything seemed fine... until the next day! A nasty segmentation fault killed the server. The plugin made us loose some money and I had to roll back the deploy.
After several days of testing, I realized the production server always died with the same set of data. I was able to pinpoint the error to RapidJSON. Something weird was happening when the memory was allocated, but none of the tools I was using to debug this were reporting any problems.
I was desperate, so I compiled the DLL with debug symbols and then I de-compiled it using OllyDBG.
I started reading the DLL assembly code ... for a week and a half! Reading assembly was horrible. I considered switching careers. But then I got to the instruction that failed! Eureka! I couldn't believe it! It felt good to finally understand the bug!
The Bug!
The problem was that RapidJSON's custom allocator:
- Compressed the data in memory.
- Allocated only what it needed.
The production machine architecture:
- Allocated the memory RapidJSON asked for.
- Ignored the way RapidJSON wanted the data to be structured.
It's easier to see with an image:
If RapidJSON needed to store an integer, a string and a boolean value, then it could fail randomly depending on the length of the string. e.g. given the integer 42
and the boolean true
:
- For the string
Hey
, id would succeed:
- For the string
Hi
, it would fail:
A debugging nightmare!
The Solution
I just made RapidJSON use the machine's allocator instead of the custom one. Spent two weeks debugging something and changed just a single word in the code!
Conclusion
Now I avoid C/C++ at all costs!
I hope whatever bug you're dealing with at the moment gets solved soon!
Cover photo by Jez Timms
Posted on April 22, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.