GSoC 20: Week 2

kniraj

Niraj Kamdar

Posted on June 15, 2020

GSoC 20: Week 2

Hello everyone!

It's Niraj again. Today, I will be sharing my code contribution of the second week of the GSoC.

Background

As I have discussed in my earlier blogs, We have a scanner module which recursively scans every binary file of the given directory and parse strings from the binary file and forward it to every checkers and they determine the vendor, product and version and pass it to the scanner then it look into the local copy of NVD database and finds all the vulnerabilities associated with the given product and displays it.

Here, scanner module uses file Unix command line utility if exists or file module to check if given file is binary executable and uses strings GNU command line utility if exists or strings module for parsing strings from binary file.

We also provide an option to extract archive and scan binary files in it. For that, we have extractor module which supports extraction of common packaging archives like rpm, deb, tar, exe, msi etc.

What did I do this week?

I have started working on my GSoC task of improving concurrency of the CVE Binary Tool. We are going to use asyncio for IO bound task like reading/writing file, downloading from internet etc. and concurrent.futures.ProcessPoolExecutor for CPU bound tasks.

I have converted IO bound synchronous functions of extractor, strings and file modules into asynchronous coroutines. Since these modules need support for async file IO and asyncio doesn't have this functionality built-in. So, I have started searching for external libraries but I can't find a single library with all of the functionality I needed. So, I decided to build one from the scratch. After 2-3 days of research and coding I have finally created an asynchronous FileIO class with all the method that synchronous file object in Python provides and also implemented asynchronous alternative of tempfile's TemporaryFile, NamedTemporaryFile and SpooledTemporaryFile classes.

Note: aiofiles is a well-known async file IO module but it lacks interfaces for tempfile and shutil and it also has many issues and PR opened for more than a year.

Since we are using subprocess at many places, I have also created async run_command coroutine which runs command in non-blocking manner. I have also converted synchronous unit test to asynchronous by using pytest's pytest-asyncio extension plugin.

If you would like to know how did I implemented above async utilities checkout my async_utils module.

What am I doing this week?

I am going to split scanner module into two separate modules: 1) version_scanner and 2) cve_scanner - I am thinking about calling it cve_fetcher to avoid misunderstanding but since I have mentioned cve_scanner in my proposal and issues, let's keep it that for now. I will be merging get_cves methods of cvedb and scanner modules into one module called cve_scanner which uses cvedb. This will make code more maintainable and readable once I convert it into asynchronous.

I am also thinking about making my own library as an alternative to aiofiles which also implements other High-level file operations functionality like shutil and deploy it on PyPI so that other developers can get benefit from it.

💖 💪 🙅 🚩
kniraj
Niraj Kamdar

Posted on June 15, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

GSoC 20: Week 2
python GSoC 20: Week 2

June 15, 2020