Public Datasets
Contents
Agriculture
Biology
Climate+Weather
ComplexNetworks
ComputerNetworks
DataChallenges
EarthScience
Economics
Government
The world's most comprehensive, [...]
This is a repository of various data, broken down by US county. While most of [...]
48,592 disaster events from four US government sources — NTSB plane [...]
Healthcare
It's a project which provides non-processed datasets [...]
Medical [...]
US Water Quality Data by ZIP Code - Water quality violations, lead and copper levels, radon [...] <https://github.com/artakulov/us-water-quality-data> [Meta`] (extract from the URL, strip any URL fragments like #readme and .git suffix)
This is the data [...]
The New York Times is releasing a series [...]
Yahoo Knowledge Graph COVID-19 Datasets - The Yahoo Knowledge Graph team at Verizon Media is [...] <https://github.com/yahoo/covid-19-data> [Meta`] (extract from the URL, strip any URL fragments like #readme and .git suffix)
ImageProcessing
MachineLearning
Contains 13'322 Asian face images distributed across all ages (from 2 [...]
The B3FD dataset is a [...]
Free Music Archive <https://github.com/mdeff/fma> [Meta`] (extract from the URL, strip any URL fragments like #readme and .git suffix)
Iranis - A Large-scale Dataset of Farsi/Arabic License Plate Characters <https://alitourani.github.io/Iranis-dataset/> [Meta`] (extract from the URL, strip any URL fragments like #readme and .git suffix)
LLVIP - This dataset contains 30976 images, or 15488 pairs, most of which were taken at very [...] <https://bupt-ai-cz.github.io/LLVIP/> [Meta`] (extract from the URL, strip any URL fragments like #readme and .git suffix)
Museums
NaturalLanguage
Natural Language
Software
Identifiers
Sports
Pro Kabadi season 1 to 7 - Pro Kabadi League is a professional-level Kabaddi league in India. [...] <https://github.com/ranganadhkodali/Pro-Kabadi-season-1-7-Stats>
Transfermarkt Datasets - Clean, structured and automatically updated football (soccer) data [...] <https://github.com/dcaribou/transfermarkt-datasets>
USA Soccer Teams and Locations - USA soccer teams and locations. MLS, NWSL, and USL [...] <https://github.com/gavinr/usa-soccer>
NFL play-by-play data - NFL play-by-play data sourced from: [...] <https://www.dolthub.com/repositories/Liquidata/nfl-play-by-play>
Pinhooker: Thoroughbred Bloodstock Sale Data <https://github.com/phillc73/pinhooker>
Tennis database of rankings, results, and stats for ATP <https://github.com/JeffSackmann/tennis_atp>
TimeSeries
Transportation
NYC Uber trip data April 2014 to September 2014 <https://github.com/fivethirtyeight/uber-tlc-foil-response>
Open Traffic collection <https://github.com/graphhopper/open-traffic-collection>
eSports
SocialSciences
Accessibility Atlas - 62 datasets on disability in the US and globally — demographics, [...] <https://github.com/lukeslp/accessibility-atlas>
Joshua Project Global Peoples Dataset - 16,382 people groups across 238 countries with 7,134 [...] <https://github.com/lukeslp/joshua-project-data>
Meteorites vs UFOs Detection Bias - 1,279 records comparing meteorite fall/find reports with [...] <https://github.com/lukeslp/meteorites-ufos-detection-bias>
FIPS-keyed county-level US inequality data spanning food deserts, [...]
SocialNetworks
Weekly cross-platform attention tracking for 2025 combining Wikipedia [...]
Air Travel Reviews Dataset <https://github.com/quankiquanki/skytrax-reviews-dataset>
Daily datasets with tweets of 1100+ accounts associated [...]
Energy
Entertainment
Finance
GIS
Collection of open 3D semantic city and region models.
The repository contains the accumulated shadow information for New York [...]