Why do we want a Package Manager for Bosh?

Everybody who has worked with a Unix based operating system or a modern build tool has come in touch with a package manager. According to Ian Murdock, founder of the Debian project, package management is the single biggest advancement Linux has brought to the industry. It eases up the ability to build and develop software modular.
BOSH already uses packages as containers for the software a release engineer wants to deploy. However, currently, these packages are being shared with copy and paste between releases. This means that if a new Java version needs to be installed in all releases a BOSH instance has, it needs to be manually updated by the operator every single one of them. This is a very repetitive task that can get even worse, if the package depends on others, as they need to be copied as well. These kinds of issues can be addressed by implementing a package management system for BOSH packages and bring release engineering a little closer to commands like ‘apt-get update’ all Linux users learned to love.

How do we build a package manager for Bosh?


As mentioned, BOSH already uses a packages structure with dependencies, so we have a solid base on which it is possible to build upon. Each of these packages is described by a spec file, with a name, a list of files and their location and the names of packages it depends upon. These spec files are written in schema-free Yaml, which enables to enhance the file with additional fields without any issues. The fields are the name of the publisher of the package, a Semver-conform version number, a description, a URL for more information about the packaged software and a field for data about the Stemcell on which it is supposed to run. To explicitly identify a package name version and publisher of it is necessary, which is why the dependency list will be enhanced with the version and publisher name of the referenced packages. This Data is saved with additional information for access control and how to reach it in an Elastic Search cluster. This way it is possible to make the data much easier to search through.

The software itself is stored as a structured zip file in an Amazon S3 Bucket and can be accessed by its file name. This structure contains the spec File itself, packaging script and the software either as source code or binary file. This name is a unique identifier stored with the other information about the package in the Elastic Search cluster. The S3 bucket is secured by Amazon-IAM and can only be accessed with a temporary security token.

Handling authentication of Users is a delicate and time-consuming matter, which is why Keycloak is being used to handle this. The Service enables the handout of a public client that can be used to create JSON Web Tokens to grant the user access to certain packages or to upload their own. Users register themselves in the Keycloak instance.

To upload, download, etc. the operator uses a CLI written in Go(Lang). This language has been chosen to make integration into the bosh-CLI as easy as possible. The Application mainly communicates with the REST-API and if it has been granted an access token with the S3 Bucket.

The central communication hub of the manager is a REST-API. It is written in Kotlin with Spring Boot. It orchestrates all requests and handles authorization based upon the bearer token the users are holding. Additional it sends search requests to the elastic search cluster and interacts with Amazon-IAM to create temporary access tokens for the S3 Buckets to let users store and download packages.

How to use the bosh package manager?

To interact with the Bosh package Manager users install the Bosh package manager-CLI tool on their computer, alongside the Bosh-CLI. The package manager CLI commands have been built with the same design principles as the Bosh-CLI, so they integrate well into the workflow with Bosh. The base operations an operator of the package manager can use will now follow.

Operators use the package-manager-CLI by navigating into the release he wishes to build. There he can download and install packages with the command bpm download {publisher}:{package-name}:{version} the desired package. Packages that already got installed will be skipped, to avoid conflicts. The package manager downloads the package with all its dependencies and sub-dependencies and places their content like scripts, binaries, and source-code into their correct place in the release.
Since the access to packages can be restricted by the author of the package the operator may have to log in to install these, by runningbpm login.
It is also possible to write a list of packages into a download-spec Yaml file similar to build tools like Maven or Gradle, which will be downloaded with the commandbpm create-release. Again, already installed packages will be skipped.

For a package to be available for download, it needs to be uploaded by an author first. To do so the author needs to be registered, a member of the publisher in which name he wants to upload and add all mandatory fields like name, publisher and version to the package spec file first, and if the package has dependencies he needs to fill these fields in the spec of the dependencies as well. Additionally, he can add the optional fields, to make the package more attractive to other operators. Below is an example of a Kafka spec file with all possible fields.

With everything set up, he now can attempt to upload this Kafka package with the command bpm upload kafkafrom the root folder of the release. The Backend will give clearance for upload if no other package with that name and version has been uploaded in the name of the publisher. All dependencies will be handled the same way as the root package. The backend signs the package with the user ID of the author, to make sure it’s origin is traceable.

Source code of the Backend can be found here,  the CLI-Tool here and information on BOSH here.

Feel free to ask any question in the comments section below!