Where to Find Data?
It’s often a struggle to find the good data that you really need to help enhance your data projects. However, Let’s continue to build a comprehensive list of sources that hopefully provide you with a wealth of datasets to work with.
Data Analysis Project Datasets:
You can click and download these files in your browser:
- Hotel Revenue Dataset: This dataset will allow you to analyze hotel data revenue.
- E-commerce Dataset: This dataset shows e-commerce sales and products with dates and customer IDs.
- Holiday Dataset: This dataset covers holidays for over 50 years that can be merged with existing data.
- Hospital Wait Times: This is an actual clinic dataset with patient consolation and wait times.
- Digital Marketing: This dataset contains website campaign conversion by channel.
- Mall Customer Dataset: This dataset has mall customers with spending, income, and other dimensions.
- Advertising Channels – This is a dataset that has 3 marketing channels budgets and sales.
- Absenteeism at Work: This has 3 datasets comprising employee records, compensation, and their tracked absences
Follow along and create Data Analysis Portfolio Projects based on the YouTube Playlist for:
If you are interested in the data analysis around the above datasets. Check out:
Datasets and Sources
- https://www.kaggle.com This is one of the best sources for data science-related datasets. It has a full code notebook and analysis that help prompt you to discover and build new information.
- https://github.com/rfordatascience/tidytuesday A lot of datasets for analysis
- https://opendata.cityofnewyork.us NYC datasets
- https://fred.stlouisfed.org/ – Economic Data regarding the US
- https://www.data.gov/ – Data repo from the US government
- https://datasetsearch.research.google.com/ – Search engines for Datasets
- https://github.com/OpportunityInsights/EconomicTracker – This includes COVID lockdown dates, changes in local policy, unemployment changes, etc. at the state and local levels), employment, consumer spending, education related statistics, and Google/Apple mobility reports.
- https://paperswithcode.com/datasets – Papers with code datasets
- https://datahub.io/collections – This has a lot of data regarding finance
- https://archive.ics.uci.edu/ml/datasets.php – your source for your standard ML benchmark datasets – things like MSINT, Iris, Titanic, among plenty of others
- https://www.earthdata.nasa.gov/learn/find-data – Earth Science Data
- https://apps.who.int/gho/data/node.home – WHO global health data
- https://data.fivethirtyeight.com/ – US politics and sports
- https://github.com/BuzzFeedNews source data from Buzzfeed News.
- https://github.com/awesomedata/awesome-public-datasets – Some public datasets
- https://snap.stanford.edu/data/ – Several social media-related datasets
- https://research.google.com/youtube8m/ – 8 million categorized YouTube videos
- https://research.atspotify.com/datasets/ – lots of music/podcast related data
- https://huggingface.co/docs/datasets/index – Lots of great text datasets