Interpreting the Drupal Core Commit History

Table of Contents

(This article was first posted on lullabot.com.)

According to Drupal’s community documentation, “The Benevolent Dictator for Life (BDFL),” Dries Buytaert, is the “chief decision-maker for the [Drupal] project.” In practice, as Dries has pointed out, he wears “a lot of different hats: manager of people and projects, evangelist, fundraiser, sponsor, public speaker, and BDFL.” And he is chairman and chief technology officer of a company that has received $173,500,000 in funding. Recently I was wondering, how often does Dries wear his Drupal core code committer hat?

In this article, I will use data from the Drupal Git commit history, as well as other sources, to demonstrate how dramatically the Drupal core “code committing” landscape has changed. I do not intend to tell the entire story about power structures in the Drupal community in a single article. I believe that issue credits, for instance, offer more clues about power structures. Rather, my analysis below argues that the process of committing code to Drupal core is a far more complex process than some might assume of a project with a BDFL.

Understanding Drupal Core Committers

Whereas Dries used to commit 100% of the core code, he now leads a team of core committers who “are the people in the project with commit access to Drupal core.” In other words, core committers are the people who can make changes to Drupal core. We can get an idea about the work of core committers from sites such as Open Hub or the GitLab contributor charts, but those charts omit key details about this team. In this analysis, I’d like to offer more context.

The Drupal core committer team has grown exponentially since the start of the Drupal codebase more than 19 years ago. At present there are 12 core committers for Drupal 8, and from what I can tell, these are the dates that each new core committer was announced:

Unsurprisingly, one task of a core committer is to commit code. For a Drupal core committer to consider a change to Drupal, the proposed change must advance through a series of “core gates,” including the accessibility gate and the performance gate. Code must pass Drupal’s automated tests and meet Drupal’s coding standards. But even after making it through all of the “gates,” only a core committer can add, delete, or update Drupal code. At any given time, there might be 100 or more Drupal core issues that have (presumably) gone through the process of being proposed, discussed, developed, tested, and eventually, “Reviewed & tested by the community,” or RTBC.

Core committers can provide feedback on these RTBC issues, review and commit code from them, or change their status back to “Needs work” or “Needs review.” Just because core committers have the power to commit code does not necessarily mean they view their role as deciding what code gets into core and what does not. For example, Alex Pott told me, “I feel that I exert much more influence on the direction of core in what I choose to contribute code to than in what I commit.” He said that he views the RTBC queue more as a “TODO list” than a menu from which he can select what he wants to commit.

Many people might not realize that core committers do a lot more than just commit code. On the one hand, as Dries shared with me, “The hard work is not the actual committing – that only takes a few seconds. The hard work is all the reviews, feedback, and consensus building that happens prior to the actual commit.” Indeed, core committers contribute to the Drupal project in many ways that are difficult to measure. For instance, when core committers offer feedback in the issue queue, organize initiative meetings, or encourage other contributors, they do not get any easily measured “credit.” It was Jess who suggested that I work on the Configuration Management Initiative (CMI) and I will be forever grateful because her encouragement likely changed the course of my career.

The core committers play significant roles in the Drupal project, and those roles are not arbitrary. Each core committer has distinct responsibilities. According to the community documentation (a “living document”), “the BDFL assigns [core committers] certain areas of focus.” For instance, within the team of core committers, a Product Manager, Framework Manager, and Release Manager each has different responsibilities. The “core committers are a group of peers, and are expected to work closely together to achieve alignment, to consult each other freely when making difficult decisions, and to bring in the BDFL as needed.”

Part of my goal here is to show that the commit history can only tell part of the story about the team of core committers. I’d also like to point out that in this article I limit my focus to Drupal 8 core development, and not, for instance, the work of the Drupal 7 core committers, the maintainers of the 43,000+ contributed modules, the Drupal documentation initiative, conference selection committees, or any of the other groups of people who wield power in the Drupal community.

This work is one component of my larger project to evaluate publicly-available data sources to help determine if any of them might be beneficial to the Drupal community. I acknowledge that by counting countable things I risk highlighting trivial aspects of a thoughtful community or shifting attention away from what the Drupal community actually values. Nevertheless, I believe that interpreting Drupal’s commit history is a worthwhile undertaking, in part because it is publicly-available data that might be misinterpreted, but also because I think that a careful analysis reveals further evidence of a claim that Dries and I made in 2016: Drupal “is a healthy project” and “the Drupal community is far ahead in understanding how to sustain and scale the project.”

Who Commits Drupal Core Code?

The Git commit history cannot answer all of our questions, but it can answer some questions. As one GitLab employee put it, “Git commit messages are the fingerprints that you leave on the code you touch.” Commit messages tell us who has been pushing code and why. The messages form a line by line history of the Drupal core codebase, from the very first commit, to the “birth” of Drupal 1.0.0, to today.

The commit history can answer questions such as, “Who has made the most commits to Drupal core?” Unsurprisingly, the answer to that question is “Dries”:

However, since 2015 Dries has dramatically reduced his core commits. In fact, he only has 4 commits since October 2015:

If someone just looked at the contributor charts or a graph like the one above, they might not realize the fact that Dries is as committed to Drupal as ever. He spends less time obsessing about the code and architecture and more time setting strategy, helping the Drupal Association, talking to core committers, and getting funding for core initiatives and core committers. In recent years he has dedicated considerable time to communication and promotion, and he has been forthcoming with regards to his new role. He has been writing more in-depth blog posts about the various Drupal initiatives as well as other aspects of the project. In other words, he has intentionally shifted his focus away from committing towards other aspects of the project, and his “guiding principle” is to “optimize for impact.”

Another part of the reason that Dries has had fewer commits stems from the recent shift in effort from Drupal core to contrib. Overall commits to Drupal core have decreased since their highest point in 2013, and have been down considerably since the release of Drupal 8 in 2015:

But once again we must interpret these data carefully. Even if the total number of commits to Drupal core has declined since 2015, the Drupal project continues to evolve. Since Drupal 8.0.0, BigPipe, Workflows, Migrate, Media, and Layout Builder are just a few of the core modules that have become stable, and the list of strategic initiatives remains ambitious. So while the data may seem to suggest that interest in Drupal core has waned, I suspect that, in fact, the opposite is true.

We can, on the other hand, use the commit history to get a sense for how the other core committers have become involved in committing code to Drupal core. We can visualize all commits by day over the entire history of the Drupal codebase for each (current) individual core committer:

We get a better sense of the distribution of work by looking beyond total commits to the percentage of core commits per committer for each year. Using percentages better demonstrates how the work of the code committing has become far more distributed (in this chart, “colorful”) than it was during the early years of Drupal’s lifespan:

You might notice that the chart above does not include past core committers such as the Drupal 5 core committer, Neil Drumm (406 commits), or the Drupal 4.7 core committer, Gerhard Killesreiter (193 commits). I’m more interested in recent changes.

When we shift back to looking at total commits (rather than percentages) we can watch the team grow over the entire history of the Drupal project in the following animation, which stacks (ranks) committers by year based on their total number of commits:

One fact that caught my attention was that Alex Pott’s name topped the list for 6 of the last 7 years. But I’d like to stress again that this visualization can only tell part of the story. For instance, those numbers don’t reflect the fact that Alex quit his job in order to work on Drupal 8 (before becoming a core committer) or his dedication to working on “non technical” issues, such as a recent change that replaced gendered language with gender neutral language in the Drupal codebase. I admit to a particular bias because I have had the pleasure of giving talks as well as working with him on the Configuration Management Initiative (CMI), but I think the correct way to interpret these data is to conclude simply that Alex Pott, along with Nathaniel Catchpole and Angie Byron, are a few of the members of the core committer team who have been spending more of their time committing code.

We find a slightly different story when we look beyond just the number of commits. The commit history also contains the total number of modified files, as well as the number of added and deleted lines. Each commit includes entries like this:

2 files changed, 4 insertions(+), 15 deletions(-)

Parsing the Git logs in order to measure insertions and deletions reveals a slightly different breakdown, with Nathaniel Catchpole’s name at the top of the list:

Differences in the ranking are largely the result of just a few issues that moved around more than 100,000 lines of code and significantly affected the totals, such as removing external dependencies, moving all core files under /core, converting to array syntax, not including vendor test code, and removing migrate-db.sh.

The commit history contains a wealth of additional fascinating data points that are beyond the scope of this article, but for now I’d like to discuss just one more to suggest the changing nature in the land of core committing: commit messages. Every core commit includes a message that follows a prescribed pattern and includes the issue number, a comma-separated list of usernames, and a short summary of the change. The syntax looks like this:

Issue #52287 by sun, Dries: Fix outdated commit message standards

Combining all commit messages and removing English language “stopwords” – such as “to,” “if,” “this,” and “were” – results in a list of words and usernames, with one core committer near the top of the list, alexpott (Alex Pott’s username):

Only one other user, Daniel Wehner (dawehner), is mentioned more than Alex Pott. I find it mildly interesting to see that “dawehner” and “alexpott” appear in more commit messages than words such as “tests,” “use,” “fix,” “entity,” “field,” or even “drupal.” It also caught my attention that the word “dries” did not make my top 20 list. Thus, I would suggest that a basic ranking of the words used in commit messages does not provide much value and is not even a particularly good method to determine who is contributing code to Drupal – DrupalCores, for instance, does a much better job.

Nonetheless, I mention the commit messages because they are part of the commit history and because those messages remind us once again that core committers like Alex Pott do a lot more than commit code to the Drupal project – they also contribute a remarkable amount of code. Alex Pott, Jess, Gábor Hojtsy, Nathaniel Catchpole, and Alex Bronstein are each (as of this writing) among the top 20 contributors to Drupal 8. Moreover, this list of words brings us back to questions about the suitability of a term such as “BDFL.”

BDFL Comparisons

While Dries could still legitimately don a hat that reads “Undisputed Leader of the Drupal Project,” it seems clear that the dynamics of committing code to Drupal core have shifted and that core committers assume a variety of key roles in the success of the Drupal project. During the process of writing this article, someone even opened an issue on Drupal.org to “Officially replace the title BDFL with Project Lead.” Whatever his official title, the evolving structure of the core committer team has allowed Dries to focus on the overall direction of the Drupal project and spend less time involved in choices about the code that gets committed to Drupal core on a daily basis. And it’s a considerable amount of code – since Drupal 8 was released there have been more than 5719 commits to Drupal core, or roughly 4.42 commits per day.

While other well-known free software projects with a BDFL, such as Vim, only have one contributor, numerous other well-known projects have moved in a direction comparable to Drupal. As of this writing, Linus Torvalds sits at #37 on the list of contributors to the Linux kernel. Or perhaps more related to Drupal, Matt Mullenweg, who calls himself the BDFL of WordPress, is not listed as a core contributor to the project and is not the top contributor to the project – that honor goes to Sergey Biryukov, who has held it for a while.

Further, one could reasonably conclude that Drupal’s commit history calls into question a concern that many people, including me, have raised regarding the influence of Acquia (Dries’s company) in the Drupal community. Acquia sponsors a lot of Drupal development, including core committers. Angie Byron, Jess, Gábor Hojtsy, Lauri Eskola, and Alex Bronstein are all paid by Acquia to work on Drupal core full-time. However, I still believe what Dries and I wrote in 2016 when we stated that we do not think Acquia should “contribute less. Instead, we would like to see more companies provide more leadership to Drupal and meaningfully contribute on Drupal.org.” On this topic, the commit logs indicate positive movement: since Drupal 8 was released, Alex Pott and Nathaniel Catchpole – the two most active core committers – have made 72% of the commits to Drupal core – and neither of them work for Acquia. So while everyone in the Drupal community owes a debt of gratitude to Acquia for their sponsorship of the Drupal project, we should also thank companies the sponsor core committers like Alex Pott and Nathaniel Catchpole, including Thunder, Acro Media, Chapter Three, Third and Grove, and Tag1 Consulting.

And the other core committers? Well, I can’t possibly visualize all of the work that they do. They are helping coordinate core initiatives, such as the Admin UI & JavaScript Modernisation initiative and Drupal 9 initiative. They are working on Drupal’s out-of-the-box experience and ensuring consistency across APIs. They are helping other contributors collaborate more effectively and efficiently. They are coordinating with the security team and helping to remove release blockers. The core committers embody the spirit of the phrase that appears on every new Drupal installation: “Powered by Drupal.” I am grateful for their dedication to the Drupal project and the Drupal community. The work they do is often not highly visible, but it’s vital to the continued success of the project.

A deeper appreciation for the work of the Drupal core committers has been just one of the positive consequences of this project. My first attempts at interpreting Drupal’s commit history were somewhat misguided because I did not fully understand the inner workings of the team of core committers. But in fact nobody can completely understand or represent what the core committers do, and I personally believe that the “Drupal community” is little more than a collection of stories we choose to believe. However, we live in a time where people desire to understand the world through dashboards that summarize data and where we gloss over complexities. Consequently, I feel more motivated than ever to continue my search for data that more accurately reflect the Drupal community for which I have so much respect. (Incidentally, if you are a statistician with an interest in free software, I would love to collaborate.) If we want a deeper understanding for who contributes to Drupal, we need more and better sources of information than Drupal’s “contributors” page. I accept that I will never concoct the magical visualization that perfectly represents “Drupal,” but I am enjoying the search.

Code for this project is available on GitLab. I would like to thank Cathy Theys, Megh Plunkett, Dries Buytaert, and Alex Pott for their thoughtful feedback on earlier drafts of this article.

Comments