28
TAMU DATATHON
GitHub Link: https://github.com/revanth-reddy/tamudatathon
Hackathon's Dataset in Hackathon
This Dataset is built as a part of TAMU Datathon 2021. It contains the details of hackathons which include Title, Location, Start Date, End Date, Prize Money, Number of Participants, Host of Hackathon, Themes in the Hackathon
As a part of Tamu Datathon 2021 challenges, we chose TD Data Synthesis Challenge. We approached the problem of synthesis of Hackathons Dataset throughout the world.
The data for this dataset is obtained by Web Scraping using Python and Selenium.
- Number of Hackathons per year.
- Hackathons Prize Money Trend over years.
- Rise of Themes through the years.
- Participation in the Hackathon across years/themes.
- And many more ...
- to predict Prize Money based on other features like participation, themes, location.
- to estimate user participation based on location, prize money, themes, etc.
- to analyze relationship between the above features.
Interactive Visualization: https://datastudio.google.com/reporting/269afbc3-e982-4fef-aee5-08010b427070 (Open to interactive with visualization playground)
User Interactive graphs and charts to analyze the hackathon's dataset




- While scraping the website, the content inside website is not rendered as static content.
- As the content is dynamic one, it is hard to scrape.
- The website has a pagination effect based on page scrolling. So, initially
x
items are loaded in the website and on scrolling anotherx
items are loaded and so on. - Unless we scroll we can't scrape the entire dataset.
- The Dataset can be used to build a Machine Learning Model (to help in picking a theme/prize money/participant estimation and also in analyzing the trends of Hackathons)
- Automation of Scraping using Job Schedulers(CI/CD) to scrape data periodically
- Large data can be scraped from various sources
- Scraping process can be further optimized by using mutlithreading. ## Team DaHack

28