Too many images on DockerHub! How different are images for the same system?
- PDF / 6,045,183 Bytes
- 32 Pages / 439.642 x 666.49 pts Page_size
- 53 Downloads / 198 Views
Too many images on DockerHub! How different are images for the same system? Md Hasan Ibrahim1
· Mohammed Sayagh1 · Ahmed E Hassan1
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Containerization is a technique used to encapsulate a software system and its dependencies into one isolated package, which is called a container. The goal of these containers is to deploy or replicate a software system on various platforms and environments without facing any compatibility or dependency issues. Developers can instantiate these containers from images using Docker; one of the most popular containerization platforms. Furthermore, many of these images are publicly available on DockerHub, on which developers can share their images with the community who in turn can leverage such publicly available image. However, DockerHub contains thousands of images for each software system, which makes the selection of an image a nontrivial task. In this paper, we investigate the differences among DockerHub images for five software systems and 936 images with the goal of helping Docker tooling creators and DockerHub better guide users select a suitable image. We observe that users tend to download the official images (images that are provided by Docker itself) when there exist a large number of image choices for each single software system on the community images (images that are provided by the community developers), which are in many cases more resource efficient (have less duplicate resources) and have less security vulnerabilities. In fact, we observe that 27% (median), 35% (median), 6% (median), and 9% (median) of the DockerHub Debian, Centos, Ubuntu, and Alpine based images are identical to another image across all the studied software systems. Furthermore, 26% (median), 49% (median), and 8% (median) of the Alpine, Debian, and Ubuntu based community images are more resource efficient than their respective official images across all the five studied software systems. 7% (median) of the community Debian based images have less security vulnerabilities than their respective official images across the four studied software systems, for which an official Debian based image exists. Unfortunately, the description of 78% of the studied images do not guide users when selecting an image (the description does not exist at all or it does not highlight the particularities of the image), we suggest that Docker tooling creators and DockerHub design approaches to distinguish DockerHub images and help users find the most suitable images for their needs.
Communicated by: Nachiappan Nagappan Md Hasan Ibrahim
[email protected]
Extended author information available on the last page of the article.
Empirical Software Engineering
Keywords Docker · Docker images · DockerHub · Containerization
1 Introduction One can package a software system with all of its dependencies and required libraries into one isolated container using Docker (2019). Docker is a platform used to instantiate Docker images on containers,
Data Loading...