Data Science as a Scientific Triathlon
Evgeniya Dontsova
Posted on July 25, 2019
Data science is a relatively new multidisciplinary field that emerged due to the need to digest enormous amount of data that people produce nowadays. Data science lies at the intersection of computer science, mathematics & statistics, and domain knowledge. This requires sufficient level of expertise in every field. Being realistic, it is nearly impossible to be an absolute leader in all disciplines simultaneously, typically it is only one. However, a good data scientist should have a sufficient level of expertise in every field, rather being a leader in only one. I would like to draw a line of comparison with triathlon. Classic triathlon is a multidisciplinary sport that includes swimming, biking, and running. A good athlete typically does not excel in any individual sports, but rather focuses on being uniformly prepared in all three. This is the power of multidisciplinary fields!
In my view, computer science is analogous to swimming since knowing the right tools is extremely important. The same applies for swimming, where everything is about technique rather than brute force. I believe that the necessary programming skills, including programming languages, can be picked up relatively quickly if you are serious and passionate about it. In addition, I argue that having a background in other disciplines facilitates quicker learning. This applies to both athletic and scientific disciplines. To demonstrate this, I would like to continue drawing a parallel with swimming. I learned how to swim freestyle literally from scratch within approximately half a year period. I reached a sufficient level to be able to complete an Ironman race, where one needs to swim 2.4 miles in open water. I believe that this is due to my experience in other sports, such as running, which provided the basic physical fitness and the concept of training. The key is to be interested and persistent. The same applies to scientific disciplines. Background helps a lot in picking up new things quickly.
Next comes mathematics & statistics. I would like to compare it with biking, where the tools (i.e. bike, helmet, shoes, clothing, etc.) that you are using are very important alongside with the fitness level. Similarly, math and statistics are tools, and one should possess the latest ones to demonstrate the best performance. For instance, choosing an appropriate pre-processing technique and the best suited machine learning method can make a big difference. Experience in choosing or even developing a new method comes with practice, as in biking: the more you ride the better it gets.
Domain knowledge is the hardest part since it is strongly related to your educational degree and work experience. It requires substantial time investments and is typically hard to gather. I would like to compare it to running, which, in my opinion, is the most strenuous among all three, especially because this is the last sport on the triathlon. To become a better runner you need years of training to build proper muscles and even to upgrade on a physiological level. This again can be done easier if one has experience in other disciplines. But nevertheless, it is crucial to have and maintain domain knowledge since it takes some time to get it.
I love doing triathlons because the main idea is to be able to control yourself not to push very hard in every discipline but rather focus on upgrading in all the disciplines simultaneously. From my point of view, this makes the whole process more challenging, but, at the same time, more enjoyable. Having a strong background in STEM and a mindset of a trIathlete, I am excited about becoming a data scientist - scientific triathlete!
Posted on July 25, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.