6502 - Some Thoughts on Open Source Projects

qiwenyu

Qiwen Yu

Posted on October 1, 2021

6502 - Some Thoughts on Open Source Projects

Here, I am going to pick two open source packages to talk and compare with. Pandas VS. Prefect.

These two open source packages are using two different licenses, where Pandas is licensed under the BSD 3-Clause "New" or "Revised" License, and Prefect is licensed under the Apache License 2.0 License.

Pandas is a large active open source project in data engineer world, published in Github, and they hold regular developer ZOOM meetings on the second Wednesday of each month at 18:00 UTC. In terms of Prefect, the core dev team has started a start-up company. This project remains as a public open source project on Github, but all contributors are more closely connected through Slack.

I believe Pandas is more open to new(random) contributers compared to Prefect. Here, I will demonstrate the differences between these two projects during the "code patch update" process, which is the issues submit and merge pull request steps in Github.

Using Pandas issue #43659 as an example, the whole process consists of issue submit, review/update issue, issue assignment, submit pull request, review/update pull request, and finally pull request merged. During the whole process, usually there are more than one reviewers, one contributor and one who found the issue. Particularly, this issue is about a function rename which can not find index values and it takes about 10 days to fix this issue. All participants during this process are very responsive and professional. First, @bustawin found and submitted the issue, and checked this issue is new, and found in laster version of Pandas. Then, wojtek2kdev took this issue by submiting a pull request to reproduce and test this issue (unit test), and fix this issue in another commit. Then, a member in the core Pandas team asked him to confirm this bug on master branch. Meanwhile, two other code reviewers checked the code of the contributors and asked him to check for code style and write concise test cases (several times). Eventually, the bug got fixed by adding a if-else condition in the right place to get the indexer and a corresponding unit test case was provided. At this stage, the contributor was asked to update the whatsnew document. This is the first pull request of this contributor submitted to this project. From my point of view, the community and core members of Pandas are very helpful during the process.

The Prefect community is also very active. It is very interesting that the most of discussions of this project have moved to Slack. However, the code update process is very similar to Pandas previously. Using a recent closed issue as an example, this issue was generated by marvin-robot automatically from the chat history in Slack. This issue was also closed by marvin-robot automatically. In this scenario, this whole code update process is not very open to new contributors.

Both projects are very professional and popular, contributors need to follow their development documentations on their website in order to make contributions successfully.

💖 💪 🙅 🚩
qiwenyu
Qiwen Yu

Posted on October 1, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related