Getting prepped with Machine Learning skills for a hackathon.

hintiiita

Hack In The North

Posted on February 11, 2020

Getting prepped with Machine Learning skills for a hackathon.

Introduction

Gone are the days when tons of lines of code had to be written just to make a simple neural network. With the rising interests of tech enthusiasts in the field of Artificial Intelligence, a plethora of libraries and tools are now available to get your machine learning model up and running in no time (well...let’s skip the training part, that does take a lot of time xD). Recent developments in machine learning libraries and tools have made building a machine learning model so easy that one can find open-source platforms like GitHub and tech blogs overflowing with different projects.

Tweaking with the model architecture and hyperparameters is one thing and actually applying machine learning knowledge to a hackathon is a totally different domain. This is, where I believe, most of the machine learning enthusiasts get stuck while approaching a hackathon problem. Well, if you are one who faces the same issue, this blog is for you!
In the coming sections, I have discussed everything starting from “how to arrive upon an idea”, to deploying your machine learning model on the web or an android app.

How to arrive upon a 'Machine Learning Idea" for a Hackathon

In recent times, mostly you will find two types of hackathons, one where you build a solution for a listed theme/ problem statement or other where you develop a solution to a problem statement that is closely aligned to their business. In either of the cases, one must be a jack of all trades and at least a king of one. A machine learning practitioner generally finds no difficulty in building up a model but may lack visualization skills and application skills.

One may not be familiar with the frontend and backend frameworks and thus it becomes hard for them to see their work as a complete product, which a hackathon expects. This is where it comes to choosing a team of people having complementing skills so that you can learn from one another and together come up with solutions that can be built into a complete product. From my hackathon experiences, if you want to generate some good hackathon ideas, your team must have at least one member from the frontend and backend domain so that they can transform your complex looking machine learning code into a product easily usable by the general public.

Being a machine learning practitioner myself, I will stage down the process into the following:

I am a Beginner, how shall I start:

Well, no worries, if you are a beginner. You shall be delighted to have chosen a field that would save humans from humans by serving one true purpose of destroying everything and everyone we ever loved, graciously letting humans walk hand in hand into extinction and bringing about a new world order governed by unbiased, sagacious and pure artificially intelligent robots( too soon?).

Anyway, for a newbie, it may be hard to come up with a project that uses machine learning at its core, but thanks to GitHub and the creative minds who push their projects on GitHub and make it available to the world, it’s a child’s play these days. I recommend exploring GitHub projects and getting an idea of the problems people have solved using machine learning. Going through different projects is an essential part of learning as it helps to know the limitations of different frameworks and helps you choose the one that best suits you and your team.

For example, The most commonly used frameworks in machine learning / deep learning are Tensorflow and Pytorch but it may not be possible to use either of them for every task. If you are developing an android app, you might wanna prefer Tensorflow over Pytorch for the sole reason that Google developed Tensorflow and has provided elegant tools to integrate Tensorflow code to the android app. You get the idea of different frameworks once you start exploring different projects on the internet.

I have made a couple of projects, how do I come up with a new one:

If you have made projects before, I am sure that by now you are already loyal to that one library, you consider, is the key to unlock the most appalling, petrifying yet necessary secrets of this universe so I’ll skip the part of explaining the pros and cons of different frameworks and will jump right away to the next part.

Well, if you have it in you to build a powerful model, you just need the right data to feed into your code. Websites like Kaggle, Analytics Vidya, HackerEarth and many more host regular competitions and make data available to the public. You may not participate in the competition directly but can exploit the data available on these websites to build a solution that solves a real-world problem. Looking at the datasets and other people’s code on Kaggle has always helped generations to come up with something new and that’s what I always do before finalizing a hackathon project.

Challenges with machine learning in hackathons

Machine learning models are no good lying in the IPy notebooks or scattered python scripts. To solve a real-world problem, the model needs to be deployed in a way it’s usable by the general public and not just by the domain experts. The main challenges you may face while creating an ML-driven product are :

  1. Finding the right dataset:

    While you may have the technical know-how on how to create the model in Keras/Py-Torch, but finding the right dataset which has the same features as your product demands could be tricky. Moreover, even if you find something which is somewhat intersecting with your needs, you might need to understand its structure and preprocess it to fit your specifications. Added to the limited time of a hackathon and the training time of your model, all this presents a big challenge.

  2. Finding the right architecture:

    Congratulations on getting your dataset in time, but are you confident that your run of the mill neural net will give you some presentable results? Did you try the Inception model? Is SSD fast enough for the demo? While there are some standard models for each domain you can try, but they all come with their pros, cons, and tradeoffs like good accuracy comes with slow speed, big size, etc. This is also one thing you might want to discuss with your team.

  3. Source code:

    If you are experienced in building models you might quickly get a model up and running and ready for the show but, If you are experienced in hackathons, however, you might not want to write it “during” the hackathon, especially from scratch. While “taking reference” from the internet is the default go-to for most people, the main challenge is to run it on your system. If the code is fairly new, chances are this may not be a problem, but if it is written in some other library, or on a deprecated version of your favorite library, you might have a hard time (CTRL+V)ing it.

  4. Making it useful:

    This is perhaps the biggest challenge when it comes to using machine learning in your project. The thing is, you have a great model (check), a well-processed data which represents your practical domain (wink wink, but check), the only problem is, you need its output somewhere off the realms of jupyter notebook, like on a website or more dangerously a mobile app. This is where you have to bring out the hacker in you. There is no default solution for this, you may have to deploy a local Flask server which acts as an ML backend, or host your model on Azure/AWS, which is costly, or use TensorFlow lite. Each of them has its own pros and cons. Be sure to think about this beforehand, since it requires some practice to get it running fluidly during a hackathon.

Perks of an ML project in Hackathons

Over the years, machine learning has found its way in every possible domain. You can find its use in almost every industry, be it healthcare, education, automation etc. That gives it the biggest advantage, you can opt for any track in the hackathon and look for a way to include machine learning which makes it all the more enchanting, natural yet seemingly impossible, making your hack more appealing to the judges and the users alike.

Coming back to Hackathons, we all must agree to one point that hackathons are all about presentation and being a machine learning practitioner does give you the power of finding relations among data and plotting complex looking graphs and charts. Analyzing a problem using past data and building a solution by exploiting it helps you build a concrete solution and make your hackathon pitches sound more confident and your end goal nearer.

With the advancements in computational power, deep learning has become a favored topic among researchers and open source enthusiasts. People have developed amazing libraries and tools which makes it very fast to get a machine learning model up and running. There are a lot of ready-to-use machine learning projects available on websites like Kaggle and Github which can be used directly or after a little tweaking for a hackathon project.

Having participated in a fair number of hackathons myself, I have seen the need of a machine learning practitioner in the team and have observed that almost every team has that one machine learning guy in their teams. Yeah, so being “the ML guy” in a hackathon does make you a core contributor to the project and you get to enjoy the hackathon doing a comparatively lesser amount of work (it’s the biggest perk according to me 😉).

Apart from the mythically formidable pheromones that “the ML guy” profuses, the person also enjoys the most creative space than the fellow hackathon colleagues. Many times, the judges are sold and swept off their feet for cheap owing to seemingly magical end results.

Technologies to learn

Here is a list of the best open-source AI technologies you can use to take your machine learning projects to the next level.

  • If you haven’t gotten your hands’ dirty writing machine learning codes, I suggest you to first get familiar with python tools like NumPy, pandas and matplotlib. Numpy is one of the most used libraries by any machine learning/ deep learning framework and mastering it will definitely help you get along with these frameworks. Pandas and matplotlib are mainly used for storing data and visualizing data respectively, you can use them for presentation purposes in hackathons.

  • Scikit-learn is an open-source library developed for machine learning. This traditional framework is written in Python and features several machine learning models including classification, regression, clustering, and dimensionality reduction. If you are not into building heavy deep learning models or want to develop a solution with less amount of data, you may use this library to build your solution.

  • Coming back to where we started, these days you just need a couple of lines of code to get your neural network running, all thanks to modern frameworks like TensorFlow, Keras and PyTorch. Keras is basically a high-level API that serves as a wrapper over TensorFlow (and theano) and saves you from horrifying mind-racking TensorFlow code. Keras is easy and intuitive and takes fairly lesser time to learn. Pytorch, again a high-level API developed by Facebook is an alternative to Google’s TensorFlow or Keras which serves the same purpose of building machine learning models quickly and optimally.

  • Well, if you are the one who doesn’t like high-level API’s and want to dive more into the core, TensorFlow 1.x/ TensorFlow 2.x is for you. With TensorFlow, you can create your own low-level functions, write parallel algorithms, build your own architecture from scratch and what not. Tensorflow is mainly used for research purposes and I personally don’t see its use in hackathons, given the ease of using Keras and the likes of it.

I am a Beginner, how shall I start:

I’ll say this again:

Hmmm, so here are some resources that have helped shape most of the modern hotshot ML Experts and you, my dear reader, will be next to avail this treasure.

  1. Machine Learning by Andrew Ng: It’s one of the most popular courses and people usually start with this course to understand how machine learning works. The course is taught by Andrew Ng, an adjunct professor at Stanford University and a pioneer in machine learning and deep learning. The course has no prerequisite except basic mathematics and programming knowledge, it is meant for beginners and teaches everything from scratch. The only thing missing in the course content is some popularly used machine learning algorithms like Naive Bayes, Random Forest, Decision Trees, etc, but don’t worry, I have got all of these covered in the next course.

    Course Link
  2. Machine Learning A-Z: If you have gone through the machine learning course by Andrew Ng, you probably now want to see those algorithms working and build some applications using them. This course is mostly focussed on practical machine learning which unlike Andrew Ng’s course doesn’t dive deep into mathematics and focuses more on intuitive understanding and programming aspects. The course instructor, Kirill Eremenko gives a walkthrough to each line of code and makes you understand how to use different machine learning libraries. This course will leave you with one or two machine learning projects which you can put in your CV and can even use in hackathons.

    Course Link
  3. Deep Learning specialization: Coming to modern machine learning techniques, deep learning, this specialization takes you from the very scratch of neural networks to modern architectures like CNN, RNN, etc. The specialization consists of a set of 5 courses which may be taken individually or as a complete specialization, all of the courses are taught by Andrew Ng and have got very well structured programming assignments that gets you comfortable with Tensorflow and Keras. Along with the programming exercises, the courses also get into the depth of mathematics and the core working of the deep learning algorithms.

    Course Link
  4. Deep Learning A-Z: Don’t have the patience of completing those 5 long exhausting courses, no worries, try this course, from the very start of this course you will get your hands on the modern deep learning frameworks and will use them throughout the course to build some amazing applications. This course will teach you how to set up a deep learning environment in your own machine and will make you install important libraries and tools which help you throughout your machine learning journey.

    Course Link
  5. TensorFlow: Data and Deployment Specialization: Now that you are backed up with the theory and know-how to write machine learning architectures in Python, R, Matlab, etc. It’s time you put your skills to develop an actual application, which is readily usable by the general public. The specialization consists of a set of 4 courses, teaching you how to deploy your model on the browser, Android / IOS, raspberry pi and microcontrollers. You will get an understanding of how to use a suite of tools in TensorFlow to more effectively leverage data and train your model.

    Course Link

Carefully choosing the right set of courses and having a continuous gradient flow in your head to learn this amazing technology will definitely bring out a machine learning expert in you and will help outperform others in hackathons.

About the authors

We are information technology undergraduates studying in IIITA. Feel free to contact us via the following channels:

  1. Dhruv Agarwal
  2. Shivansh Beohar
💖 💪 🙅 🚩
hintiiita
Hack In The North

Posted on February 11, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related