My Capstone Project: Deep Learning to Detect Bugs in Code using Graph Based Neural Networks
Mukund Raghav Sharma (Moko)
Posted on January 16, 2022
Over this long weekend, I decided to revisit details of my capstone project for my Master's in Data Science (graduated May 2020) using Deep Learning to detect bugs in code. The paper can be found here.
This paper involved conducting a comparative study of the efficacy (based on test accuracy) of Gated Graph Neural Networks (GGNNs) vs. Relational Graph Convolutional Networks (RGCNs) on a task to automatically detect the misuse of a variable for the top 25 trending C# repositories on Github.
The results showed that RGCNs outperformed GGNNs for all cases (did a considerable amount of randomized hyperparameter tuning but wasn't fruitful to shake up the results), albeit, within < 5%.
My work was based on multiple papers by Microsoft Research (particularly https://lnkd.in/gd5kTEEv) and used Tensorflow to conduct the analysis. In a nutshell, the training data obtained was a modified version of the Abstract Syntax Tree generated by the Roslyn compiler.
Some lessons I picked up from this experience are:
- Sticking with a white paper even if you don't understand any of it in the beginning.
- Digesting material through different media is a good to way to switch it up: I heavily relied on YouTube videos of Deep Learning conferences (https://lnkd.in/gzcTRGhG) to get a more lecture based approach to bolster my learning.
- Treating data as a first class citizen. If possible, using version control / backing up data is a definitely a lesson I learnt through this. Clobbering my old weights was something I did more times than I like to admit.
- I am a big advocate of self describing code, however, since this was such a new space for me with a steep learning curve, commenting as much as I could made a significant difference.
- Testing on small byte sized chunks saved me a considerable amount of time down the road: prototypical testing saved me countless hours because of the countless runtime errors that I could have faced if I hadn't done an E2E run.
Any feedback would be greatly appreciated! Happy to answer any questions.
Posted on January 16, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
January 16, 2022