Transparency and user agency as principles for distributing and consuming open source software packages

More people are building more kinds of software that is consumed more often and in more places than ever, and the likelihood of that software being open source or having an open source dependency is very high compared to just a decade ago.

These forces (software publishers, software kinds and software use cases) along with network evolution, how software is monetized, and how people and organizations use technology put pressure on the software distribution technology and techniques such as software packages, package managers, and software repositories.

I've been researching Linux and open source package management for a while and I'm very excited about many of those technologies and their applications, from distri and systemd/mkosi to libostree and spack. Unsurprisingly, many of these are prompting us to revise how we think about distributions.

Inspired by The Laws of Software Installation, here's an attempt to elaborate on the "contract" between software authors and users, particularly in open source software where the volume and composition possibilities create a vast and complex ecosystem of its own.

Our expectations of transparency for software packages have evolved. What used to be minimal metadata regarding the publisher, description of the software, and signatures has been enhanced with things like licensing information and is fully evolving into Software Bill of Materials: the Nutrition Facts label that allows every software package to explain how it came to be, where it comes from, who and what was involved in making that happen.

In an open source world, consent builds upon transparency. We have been influenced by certain use cases, such as mobile apps, where "permissions" (both stated by the developer and enforced by the platform) have become the expectation.

All of this means different things to different people: some users might want to know if the package lays itself out in the filesystem according to FHS, others need to know if the application will self-update or install publisher certificates, if it needs network egress to ship telemetry at runtime, if it pulls additional dependencies out-of-band to a local cache, if it changes environment variables, starts automatically on boot, need to run in a privileged container, ships with and relies on LSM integration and more.

In general, we lack a standardized mechanism to carry behavioral ("permissions") information in a package, let alone prove them or make policy decisions such as allowing a pull into a build or in production based on the organization's choices around, say, network egress or vendoring.

Interfaces are another interesting aspect of software packages that can be easily overlooked. Beyond the most basic operations (install, remove and maybe update) package managers don't share semantics let alone have a harmonized approach to automation: allowing themselves to be handled by the system in user-defined ways.

Hooks, triggers and other artifacts are regularly abused to achieve certain automation goals such as preseeding configuration or performing certain provisioning steps right after install, sometimes overreaching in terms of administrative privileges usage with broad security implications.

Package managers inevitably grow in sophistication to meet certain user needs (see dpkg triggers) but in general, whether you are a software publisher, integrator, IT organization, a developer or a system administrator you are expected to learn the inner workings and nuances of each package system instead of relying on standardized interfaces, which results in wildly varying user experiences and thinly spread resources.

A good way to illustrate the complexity of this today is parsing through the several thousand lines of Ansible code devoted to dealing with APT or DNF, or how basic operations such as listing Linux packages or Go modules are handled.

These two attributes, transparency and interfaces, are only worth investing in if they give users agency. Of course, there are other attributes that make a high quality software package, particularly a judicious use of resources and testing for alternative configurations while providing sensible defaults.

Finally, I think the jury's out for at least two topics: vendoring and component duplication, and automatic upgrades. While there will always be very good arguments against both in several use cases I expect more of both as a result of growing software "kinds" in a world where there isn't a lot of contention for storage or networking.

It seems there's only one thing we love more than the open source components that we use and build upon daily, and that's the package manager that we use to acquire those components. There's so much exciting activity happening in this space, that is easy to focus on technical differences such as package formats or installation mechanics when looking at package systems. With this post, I've tried to make the case for considering user agency, quality, transparency and interfaces as key tenets of the "contract" between software publishers, distributors and users.

Summary

  • Growth of software publishers, software kinds and software use cases along with changes in software monetization and network evolution are changing what users expect from software packages
  • While fun and necessary innovation is happening, package formats, implementation choices and the intricacies of how a software package is installed aren't necessarily where we'll meet future user expectations or how we give them agency
  • Packages should not only describe what the software is, but how it was made and where it came from; packages should also describe their behaviors: metadata becomes a contract
  • Package operations (across the entire lifecycle and well beyond installation) should be automatable via interfaces and said interfaces should allow for user-defined options: alternatives, configurations, installation paths, etc.
  • Quality remains critical: packages must guarantee their removeability, keep promises across the lifecycle and provide users with control on mutating logic (e.g., triggers) and resource usage, noting that user's attention is also a finite resource that increases security risk when depleted

Are you a software publisher or distributor? What steps are you taking to make your packages more transparent and give users more control? Are you a developer? I always love to hear new things that people have learned about software packaging and distribution. Input is always welcome!

16