Empirical Software Engineering
Evidence-based research on software systems.
Contents
Repositories
All data used in the openly available book Evidence-based Software Engineering
Collaborative collection and analysis of free/libre/open source project data.
Collaborative collection and analysis of free/libre/open source project data.
About 20 datasets related to software engineering research.
Software-artifact infrastructure repository; Java, C, C++, and C# software together with test suites and fault data.
Software data collections in CERN's open-access repository.
Zenodo
Software data collections in CERN's open-access repository.
http://zenodo.org/communities/seacraft
https://zenodo.org/communities/empirical-software-engineering/
https://zenodo.org/communities/msr/
Data Sets
Collection of 395 reproducible bugs collected with the goal of advancing software testing research.
Set of FindBugs reports for the Java projects of the Maven repository.
The Linux Kernel 4.21 Call Graphs produced using CScout.
Collection of software complexity & sizing metrics for the Maven Repository.
Multi-extract and multi-level dataset of Mozilla issue tracking history.
The dataset contains the analysis results of 5 open source software quality tools eslint, escomplex, nsp, jsinspect and sonarjs for 2000 popular (in terms of stars and downloads) npm packages.
Data set of 9188 OCL expressions originating from 504 EMF meta-models in 245 systematically selected GitHub repositories.
Git repository with 46 years of Unix history evolution.
Bug Dataset of 15 Java open-source projects characterized by static source code metrics.
Data set containing a collection of engineered software projects from GHTorrent.
Tools
Library and tool for mining of path-based representations of code and other data derived from ASTs.
Multi-language tokenizer for extracting identifiers from source code.
A Java framework for analyzing code changes and mining instances of change patterns from Git repositories.
Mine GitHub activity and market cap data for cryptocurrency projects.
Extract embedded SQL statements and detect database schema smells.
Compute source code metrics and detect a variety of implementation and design smells for Java.
Agile Ruby Tool to analyze Git repositories.
Code evolution analysis for Git repositories.
Java tools and infrastructure to resolve the whole Maven dependency graph, hosted in Maven Central, in the form of a Neo4j Graph.
Fetch repository data from tens of back-ends.
Detect configuration smells in Puppet code.
Python Framework to analyse Git repositories.
Python tool to compute a score for a repository from GHTorrent. The score quantifies the extent to which the project contained within the repository is engineered.
Library/API for detection of refactorings in changes of Java code.
Java framework enabling the automated collection of commits fixing vulnerabilities that are reported in NVD (links NVD with Git).