Insolita
Posted on May 2, 2020
Inspired by an article Fast Web Scraping With ReactPHP I've decided to make a benchmark for checking, how much it faster than some other popular libraries, like Guzzle, which also can create async requests via multicurl and Amphp that is another non-blocking php framework, that contains http-client
I don't want to make a synthetic benchmark and prefer for my test more practical task - is loading different real urls from the defined list (part of them may be broken), scrape it titles, and save into a file.
During development, I've faced certain difficulties, which did not allow me to create my test clients completely similar. Each client has its own specific features, especially amphp, and also I have not so big experience with async libraries such as reactphp and amphp.
So, you can see the repository with test stuff and benchmark results here
https://github.com/Insolita/php-async-benchmarks All tests were written with php7.4. Each check was run 10 times, and I publish min, max, and average execution time. I should add a notice that the concrete numbers not so important because it depends on internet speed, server config, etc.., and you can have another result. Only their relative differences has a value
Firsts results really surprised me. ReactPhp works 2 times slower than the Guzzle. I rechecked it again and again, but the numbers stayed the same. But with the increasing number of queries, its performance becomes better and better. On the other hand, amphp performance becomes slower and slower and I even exclude it from the last measurement. (It depends on its specific, at the documentation, I find out only one way for concurrent requests https://amphp.org/http-client/concurrent, probably exists better way or additional libraries that also allows to queue promises smarter (like a clue/reactphp-mq), but I have not found it)
In summary, ReactPhp can be a good decision, when you need to fetch many thousands of urls, especially when you keep it as a separate worker, that will receive tasks by socket/Redis or http api. Amphp can be good when you need to fetch a little batch of urls, 5-10-50 asap. Also it can become better with additional wrappers. The Guzzle is awesome.
UPD: Just see the power of the Open Source community in action! One of maintainers of the Amphp, Niklas Keller, thanks to my benchmark find out and fix the bug. And now, thanks to the help of Dmitry Balabka and Niklas Keller - the performance metric of Amphp http-client was significantly improved!
UPD2: An outsider becomes a winner! Thanks to improvements, Amphp metrics look good at small batches as well as big! I'm intrigued, will reactphp team to offer improvements for increase their http-client speed?
UPD3: Yes! ReactPHP team accept the challenge, and also start to work on the fix!
Interesting! It looks like your list of servers contains a bunch of entries that don't strictly conform to HTTP specs. I'm currently working on fix for @ReactPHP here: https://t.co/1gwIR2s4fp
— Christian Lück (@another_clue) May 4, 2020
Posted on May 2, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.