1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31## Datasets
Here are some datasets that you could use for the project:
* [Kaggle](https://www.kaggle.com/datasets)
* [AWS datasets](https://registry.opendata.aws/)
* [UK government open data](https://data.gov.uk/)
* [Github archive](https://www.gharchive.org)
* [Awesome public datasets](https://github.com/awesomedata/awesome-public-datasets)
* [Million songs dataset](http://millionsongdataset.com)
* [Some random datasets](https://components.one/datasets/)
* [COVID Datasets](https://www.reddit.com/r/datasets/comments/n3ph2d/coronavirus_datsets/)
* [Datasets from Azure](https://docs.microsoft.com/en-us/azure/azure-sql/public-data-sets)
* [Datasets from BigQuery](https://cloud.google.com/bigquery/public-data/)
* [Dataset search engine from Google](https://datasetsearch.research.google.com/)
* [Public datasets offered by different GCP services](https://cloud.google.com/solutions/datasets)
* [European statistics datasets](https://ec.europa.eu/eurostat/data/database)
* [Datasets for streaming](https://github.com/ColinEberhardt/awesome-public-streaming-datasets)
* [Dataset for Santander bicycle rentals in London](https://cycling.data.tfl.gov.uk/)
* [Common crawl data](https://commoncrawl.org/) (copy of the internet)
* [NASA's EarthData](https://search.earthdata.nasa.gov/search) (May require introductory geospatial analysis)
* Collection Of Data Repositories
* [part 1](https://www.kdnuggets.com/2022/04/complete-collection-data-repositories-part-1.html) (from agriculture and finance to government)
* [part 2](https://www.kdnuggets.com/2022/04/complete-collection-data-repositories-part-2.html) (from healthcare to transportation)
* [Data For Good by Meta](https://dataforgood.facebook.com/dfg/tools)
PRs with more datasets are welcome!
It's not mandatory that you use a dataset from this list. You can use any dataset you want.