How to win a Data Science Hackathon

KDAG IIT KGP
10 min readOct 8, 2021

Data hackathons are one of the best ways to prove your data science skills and to learn a lot in a short amount of time. Different hackathons are held for different reasons. Some for a company’s own profit, some for social or environmental causes, and some for solving a general problem in any domain. They may have different goals for the participants like building an MVP, or getting the best performance on a Kaggle leaderboard, or just deriving some inferences from data to propose solutions. Whatever the type of hackathon, here are some important tips that will serve as guidelines for winning any of them.

Know your Team

Having a great team can make or break the overall experience, whether you’re new to hackathons or a seasoned developer. Why?

Because hackathons can be a lot of fun but also a lot of work. Teams get together for a few days or a weekend to develop a viable product, often working through the clock with not much sleep or breaks in between.

The process will be less daunting and more interesting if you have a healthy, cohesive, and multi-functional team.

5 pointers for forming a successful hackathon team:

1. Start looking for teammates as soon as you decide to participate in a hackathon:

The best head start you can give yourself is to begin looking for teammates as soon as you decide to participate in a hackathon.

Don’t wait till you have an idea; once you’ve chosen to participate in a hackathon, get the word out and see who’s interested.

2. Team members should be diverse:

It’s no secret that diverse teams outperform homogeneous groups. Make it a point to seek out folks that are different from you when seeking teammates. You want your team to be diverse in terms of strength and ability, and with complementary skill sets.

3. Assign responsibilities to people based on their skill sets:

Once you’ve assembled your team, it’s critical that you understand everyone’s abilities and methods. In most situations, every hackathon team has a few responsibilities, and you may save a lot of time by allocating those positions based on everyone’s primary talents.

By distributing duties in this manner, each team member has the opportunity to focus on specific areas where they excel.

4. Find people with whom you get along:

This is maybe the most essential advice of all. While having good team chemistry is important in any situation, hackathons add a layer of pressure, stress, and tiredness to the mix, which may be amplified if you’re trapped with a team you don’t get along with.

Make a list of people you believe you’ll get along with and make sure everyone contributes something useful.

5. Discover the hackathon’s message:

With this simple practice, you can put your squad apart by miles. It’s also a good method to see how capable each team member is at digging deep and thinking beyond the box.

Gather your team and brainstorm the hackathon’s message during the ideation phase.

What are the requirements of the businesses? What are their actual wants and needs? Who are the event’s sponsors? What is their relationship to the hackathon’s organizer?

Understand the PS

The most important part of any hackathon is understanding the Problem Statement (PS). All the hard work you put into the competition may go to waste if you do not concentrate on this step. A data analytics problem statement has a few parts.

The application:

  • Most data hackathons have real data from some company or organization. You need to understand what they aim to achieve from your solution. Do they want to identify potential locations to increase their market reach? Do they want to achieve a more sustainable way of doing something? Knowing this will help you to go along focusing these questions throughout your data analysis, which will later help you in your presentation to showcase the usefulness of your solution.
  • Decide if it is an inference-based or prediction-based competition. Inference-based competitions need you to find associations between various factors and the problem. Prediction-based competitions need you to make accurate models that can predict something using the data to solve the problem.

The data:

  • For tabular data, try to understand each and every column and what they practically mean to the dataset. If there are multiple tables, identify the connection between each of them. For textual and image data, look at some instances yourself to know their nature and get an idea of their relation to the PS.
  • Thoroughly read the data description if provided and look for mentions of the data in the problem statement question.
  • Most importantly, for competitions that need you to make ML models understand if this is a supervised, unsupervised, or semi-supervised learning problem.

All this shall help in Exploratory Data Analytics (EDA) and Feature Engineering in later stages.

The metrics:

  • What good is your solution if the problem demands high “recall at K” and you provide high accuracy but low recall at K? Know the proper metrics desired by the competition. Any models you develop should be finally judged using those metrics.
  • Also, understand why the particular metric is necessary to the company’s aim. It won’t hurt to present other metrics but the desired ones shall be given prime importance.

If you are confused about any of the above points, then ask questions. Most of the organizing teams are co-operating and they will help you out. Data science needs domain knowledge and if you do not have that, don’t be ashamed to ask even basic stuff about the domain (provided it is relevant to the PS). Not everyone is an expert in each domain, so do your research, and if you don’t get something, simply ask.

Data Analysis and Modeling

This is the major part of your work in a hackathon. Apart from developing the necessary skills for this, there are some pointers you need to keep in mind in a hackathon.

1. Feature engineering is more important than model selection: Engineered features! One of my favorite aspects of a data science hackathon is this. When it comes to feature engineering, I get to use my imagination — and what data scientist doesn’t like that? Feature engineering is the process of extracting additional information from previously collected data. You’re not providing any new information here; instead, you’re making the information you currently have more helpful.

  • Assume you’re attempting to forecast footfall at a shopping center based on dates. You may not be able to draw relevant insights from the data if you try to use the dates directly. This is because the day of the month has less impact on footfall than the day of the week. Your data now has this information about the day of the week. To improve your machine learning model, you must bring it out.
  • The quality of the features in the dataset used to train a predictive model has a significant impact on its performance. Its performance will improve if you are able to build new features that assist in delivering more information to the model about the target variable.
  • Devote a significant amount of time to feature engineering, pre-processing and exploratory data analytics. You should pay close attention to this because it can have a significant impact on your grades.

2. Ensemble models: Ensemble modeling is a useful tool for improving your model’s performance. It’s the art of mixing different data from several models to improve the model’s stability and predictive capacity. There isn’t a single data science hackathon that includes top-finishing solutions that don’t include ensemble models.

3. Trust your cross-validation: Don’t start constructing models by throwing data at the algorithms. While getting a sense of basic standards is helpful, you should take a step back and develop a complete validation framework. Without validation, you’re just guessing. Overfitting, leaking, and other evaluation difficulties will be at your mercy.

  • You may make faster and better improvements by duplicating the evaluation method and measuring your validation findings, as well as ensuring that your model is robust enough to perform well on diverse subsets of the train/test data.
  • Have a strong local validation set and don’t rely on the public leaderboard too much, since this might lead to overfitting and a significant reduction in your private rank.

4. Code refactoring: This step is more important if the working MVP is given more importance by the judges than the idea. Imagine living in a room where everything is a mess, with clothing strewn about, shoes stacked on shelves, and food spilled on the floor. It’s a real pain. Isn’t that so? The same may be said of your code.

  • When we first begin a competition, we are likely to write sloppy code, copy-paste from previous notebooks, and use some code from Stack Overflow. If this pattern is followed throughout the notebook, it will become cluttered.
  • Understanding your code will take up the majority of your time and make operations more difficult. The solution is to refactor your code on a regular basis.
  • Keep your code up to date at regular intervals. This will also assist you in forming teams with other participants and improving communication.
  • Keeping your code organized will also help you in future hackathons when you reuse your code.

Presentation

This is usually the final round of the hackathon and is used to solely decide the winner in some cases. You go to the final round with the performance of your metrics or the quality of your solution. But here, you are given a chance to also show the amount of work you have done. Your solution or metrics maybe a little worse but your presentation may provide better insight to the company or organization to use your solution. Here are some tips to make a stunning and smooth presentation.

1. Visualization: You have only a few allotted minutes and what better to concisely represent what you mean than visualizations.

  • Add as many visualizations as possible. The textual details should be in your presentation notes for you to speak, not in your presentation.
  • Do not use the wrong kind of visualization for something. It’s better to have no visualization at all than to use a pie chart with 100 sectors (or a bar chart that starts at 60 and shows a double increase in performance from 65 to 70).
  • Image data is a good chance to add example images and textual data is a good chance to add word clouds.
  • Apart from that, flowcharts and diagrams to describe thoughts and processes are a must.

2. Communication:

  • The last thing you want is to use 100 technical words in front of a non-technical judge. Know your audience (that is, judges) and make the presentation understandable to them.
  • Provide a concise description of everything you have done. Keep detailed descriptions reserved only for describing the most important contributions made by you.

3. Content: Other than concisely describing everything you have done from top to bottom, make your presentation complete with a few more discussions.

  • Analyze the errors if you are making a prediction model. For example, for image classification, show the worst wrong predictions and identify any similarities among the images to show what kind of image the model is failing on.
  • Talk about the challenges that may be faced to implement your solution practically.
  • Do not show your solution in a negative light while doing the above two points.
  • Tell them why your model works by showing ablation study, comparison with other models, etc.
  • Show possibilities for further improvements.
  • One important question asked is the explainability of models. If your problem statement cares about the inference (and not only the predictions) then you need to show to what extent your model is explainable.

4. Questions: Keep a list of questions and their answers ready before the presentation. Try to be as concise as possible so that they get the time to ask as many questions as they can. Follow proper etiquette of answering questions.

Hackathons invite developers, start-ups, scientific teams, and small businesses from across the world to solve a problem. On the surface, the Hackathon appears to be a fun opportunity to test limits and think outside the box. But it’s also become one of the most exciting gatherings, a testing ground for fostering, celebrating, and recognizing innovative ideas.

Teams use data to conceptualize a problem or issue, establish a strategy, bounce ideas off each other, borrow other teams’ concepts and techniques, exchange knowledge and expertise, and often arrive at a solution that’s not really what they anticipated.

It’s an exciting way to work since it necessitates cooperation and encourages participants to inject the most important element of all: Meaning. Hackathons have always been about invention, yet meaningful ideas are required if they are to impact people’s lives. It’s pointless to create things that are brilliant yet meaningless.

That kind of thought is crystallized in the Hackathon. Despite its chaotically quick and short-lived character — with many teams working through sleepless night(s) — it challenges participants to focus on a problem and consider new ideas with a wide range of inputs.

Hackathons are an exciting new way to collaborate and keep us all on our toes. Keep these pointers in mind, and you will see miracles happening. Hope you enjoyed the blog. We shall come up with some more useful and exciting content in the next one. Till then, stay tuned!

You can follow us on:

  1. Facebook
  2. Instagram
  3. Linkedin
  4. YouTube

--

--

KDAG IIT KGP

We aim to provide ample opportunity & resources to all the AI/ML enthusiasts out there that are required to build a successful career in this emerging domain.