(Re)Evaluating Artifacts for Understanding Resource Artifacts
Project Idea Description
- Topics: Virtualization, Containerization, Profiling, Reproducibility
- Skills: C and Python and DevOps experience.
- Difficulty: Medium
- Size: Large; 350 hours
- Mentors: Tanu Malik
This project aims to characterize computer-science related artifacts that are either submitted to conferences or deposited in reproducibility hubs such as Chameleon. We aim to characterize experiments into different types and understand reproducibility requirements of this rich data set, possibly leading to a benchmark. We will then understand packaging requirements, especially of distributed experiments and aim to instrument a package archiver to reproduce a distributed experiment. Finally, we will use learned experiment characteristics to develop a classifier that will determine alternative resources where experiment can be easily reproduced.
Project Deliverable Specific Tasks include: A pipeline consisting of a set of scripts to characterize artifacts. Packaged artifacts and an analysis report with open-sourced data about the best guidelines to package using Chameleon. A classifier system based on artifact and resource characteristics.