Open source to the rescue — again

Michael Dorner
7 min readOct 19, 2018

--

There is arguably no other engineering field so greatly affected by the problem of the not-invented-here syndrome (or its facet reinventing-the-wheel syndrome) like software development. This disease does not only affect the artefacts of software development (meaning code and software components), but also the software development process itself.

While we start using open source components instead of rewriting yet another message-passing component or ORM framework, we are constantly reinventing the wheel of software development practices instead of re-using established practices from open source.

We came up with Scrum, XP, and many more. And then — because none of them scaled well enough — with Scrum of scrums, SAFe, Large-Scale Scrum (LeSS), and what have you. All of them try to reinvent the wheel, most of them come with a certification system, so they can serve as money printing machines.

But we have already a software development framework, which has proven its robustness, scalability and its capability of producing great software: open source software development!

Please do not get me wrong: Not all open source projects and their software development practices are stellar, and there is not one single open-source development process. Open source is very diverse. However, all successful open-source projects have in common, that their software development is

  • egalitarian: Everyone can contribute (rather than people are locked out)
  • meritocratic: Decisions are merit-based (rather than status-based)
  • self-organizing: People adjust processes to their needs (rather than the other way)

The open source principles

Let me explain each of these principles of open collaboration in this section.

Egalitarian: Everyone can contribute

Open source is inclusive, nobody is excluded. As an open source contributor it does not matter where I am from, how old I am, which country or time zone I am currently in, who my employer is, what my background or motivation for contributing is.

Meritocratic: Decisions are merit-based

In open source, any decision is based on the merits of the argument and the value it brings to the project. There are no hierarchies as they are in companies, where decisions are made on a person’s status. And after some escalation steps, people with a high-status decide on topics which have no expertise and/or have no insights into the underlying technical problem. You will not find a CEO/CTO of OpenStack, Python, or Apache webserver anyway. This leads us to the last principle.

Self-organizing: People adjust processes to their needs

The each open source community organizes itself and adopt its processes as they need it. Open source projects are very diverse, from low-level hardware driver to webservers, from small utility methods of some lines to large operating system, from a group of students to several thousands developers — there is not one practice that fits them all. Without contradicting the open source principles, there are many manifestations of the software development practices.

Implications

These principles imply some conclusions, I want to describe in this section.

Open communication

Obviously, it is not enough to open the code. Everything must be open, also the communication. Open communication is communication that is public, written, archived, asynchronous and complete. With some exceptions, the communication is in English. Only if all of those criterions are fulfilled no potential contributor is excluded, because, for example, he is in another time zone.

This means: no standups, no meetings, no face-to-face chats, and no phone calls, because this locks out people from other countries/time zones, other employers, etc. (see the principle egalitarian). And this works: There are no meetings at OpenStack with all its about 1000 monthly contributors.

Roles

Two roles follow from the egalitarian and meritocratic principles: Of course, the fact that everybody can contribute does not mean that every contribution is waved through without any quality assurance. If you doubt that, try to make any contribution to Kubernetes or the Linux Kernel. This is why open source differentiate two roles: contributor and committer.

A contributor makes contribution to a project. A contribution ranges from simple comments or a bug report to code introducing new features. He sends code contributions in form of patches (in the original meaning) or pull/merge requests (in GitHub or GitLab speak) to a committer. A committer is a developer who has write access to the repository, reviews and accepts or rejects code contributions from contributors . This implies a strong code ownership and I am aware how controversial this statement is.

Usually there are several committers to avoid a high bus factor. Sometimes one of the committers is a BDFL (benevolent dictators for life), like in Linux (Linus Torvalds) or Scala (Martin Odersky). These committers usually were the original authorship of the project. However, if there is a BDFL in an open source project, the benevolent is very important for a functional and healthy community. Several open-source projects broke apart because the BDFL was a dictator and neglecting benevolent.

Forks

Forks are an important option for self-organizing in open source. Everybody can fork the project and drive the development of the forked project in a certain direction or change the rules. Usually this happens only with good reasons, because a fork splits the code base and the community. But forks also spawns competing projects fighting for the best developers, processes, and best development.

Forking happens regularly in open source (e.g., ownCloud → Nextcloud, OpenOffice → LibreOffice), and rarely the forks merges back into the original project (e.g., gcc → egcs → gcc).

Benefits

So why is this a big deal? There is no free lunch, open communication, for example, comes with significant efforts.

I picked some of the benefits from our research, which are the most important to me and which are not easily achieved by other software development methods (such as Scrum).

Higher code quality

Code quality profits from this openness. Or in other words by Linus Torvalds

Given enough eyeballs, all bugs are shallow.

I hope no one would dispute that. Or do you know any modern cryptographic standard which is not open source?

Committers are not appointed by a management, they are contributors, which originate from the community and have shown their merits to the project by non-trivial contributions. With this dedicated committer role, we empower the smartest developers available, which invested a large portion of their time, energy, and heart into a project, to decide on the project and its progress.

Improved code review

Closely related to the higher code quality is the improved code review process. In open source, solely committers decide on acceptance and rejection of changes.

The only question a committer must answer himself to decide on acceptance or rejection of a change is rather simple: Am I able to maintain this change? Because in worst case, a contributor may contribute one change and then never seen again. There is also no common boss to apply pressure to the contributor. In the end the committer has to maintain the code himself, find someone who can (but again no pressure possible), or rewrite the code. None of these is desired. Being aware of that ensures that code review is taken seriously and not considered as necessary evil.

This initial simple question brings further questions: Do I understand the change (good documentation)? Is the change atomic and small (no large monster commits)? Are there sufficient tests (if required)? Does the change contradict our software architecture?

Only if all questions can be answered positively, a change gets accepted, otherwise the change gets rejected.

There is rarely a plain rejection. Usually, the committer thanks for the contributions (even a one-time contributor may become a periodical contributor if he or she is treated nicely), and then gives feedback on the change. Due to the open communication, he can reference old discussions or decisions and does not explain everything a thousand times.

Committer can be wrong, too. This is why the feedback is also public accessible (e.g. in GitHub with pull requests). So others can disagree and join the discussion.

This makes code review simple and important, but by that more powerful. Code review becomes a discussion board about code and code only (you will not discuss your lunch options or budget-planing in your code review tooling).

There are so many more things to say about code review and because code review and its measurement is my primary research topic, I am planning to write an article series just on code review particularly.

Passive documentation

Because all documentation is complete, archived, and written (and this means also searchable) the project gets a documentation over time with a minimal effort.

Circumstances and requirements may change. Sometimes a decision made four years ago is obsolete, sometimes it is still valid. With the open communication everybody can easily verify this decision, what would never be possible, if the communication happened just in a phone call or a meeting. In the best case, there is a written summary of the meeting or phone call somewhere (can you find it?), but in open source you get not only the decision itself and who made it, but also the discussion towards the discussion, involved parties, circumstances, etc. And sometimes these are the essential information!

Inner source — open source within a company

As you can easily image, this is nothing I just invented ( → reinventing the wheel): The use of open source software development best practices and the establishment of an open source-like culture within organizations is called inner source and meanwhile well established.

Lots of companies reported practicing inner source: Google, HP, Philips, Adobe, Nokia, Bosch, Microsoft, SAP, PayPal, IBM, and many more. Some don’t call it inner source, but use the terms corporate or progressive open source, others just do it without naming it so (Google, Facebook, GitHub, …) due to their deep relationship to open source. Some of the companies opened up single components, others their entire code base. Some are closer to the “open source gold standard”, other further afar.

Although tooling is a very important aspect, it is not sufficient to use GitHub Enterprise or GitLab within the confines of a company. The same logic would be “I uploaded my code on GitHub/BitBucket/GitLab/whathaveyou, so I am running an open source project” — no you are not, you are just dumping code somewhere, which is necessary but not sufficient.

Conclusion

Sometimes summarizing is complex, but not in this article: Do not reinvent the wheel, do use open source {software|practices}!

Everybody is crying for ML, AI, IoT, <please add here more IT buzz words> — ignoring that all of these fields require significant software development effort. And as long as we are developing the 29047th logging framework with the 5345th software development approach, we will never got the point where machines and programs become really smart.

May I add a personal wish at the end: please do not let become inner source to a buzz word! Do it, but do it right!

--

--

Michael Dorner

Software developer and software engineering researcher