As part of a recent customer engagement, we were tasked with defining a naming convention for GitHub repositories. Up to this point, each project team had used whatever convention (or none) they liked to define the repository name, leading to a situation where there was a lack of consistency across the GitHub organization.
NEW RESEARCH: LEARN HOW DECISION-MAKERS ARE PRIORITIZING DIGITAL INITIATIVES IN 2024.
I had my own ideas around what conventions made sense, although these are influenced by the type of work I do. As I tend to spend some of my engineering time working with Python, there are conventions around naming formats that make sense to me. These, of course, might not make sense to individuals in other fields.
GitHub by default prevents certain characters from being included in the repository name, but how do we use those that are left to provide meaning to engineers, QA staff, and others who use the code base and GitHub day in and day out?
Enter semantics.
Semantics is the study of meaning when applied to linguistics, semiotics programming languages, and logic. You might be familiar with the application of semantics in a programming language, as distinct from syntax. Furthermore, software developers and DevOps engineers will likely have come across semantic versioning (semver.org), which aims to provide meaning to application version numbers.
Perhaps we can use a semantic approach to repository naming?
How others have tackled the problem
GitHub maintains a style guide standard but does not explicitly list a standard for the repository name itself.
Investigating other articles, guides, repositories, and stack overflow suggested the consensus was to use hyphens to separate portions of the repository name. Some organizations have taken the step to formalize this.
One example is the British Columbia Policy Framework for GitHub Document Naming Repos
This document enumerates criteria for a repository name, which includes:
- Descriptive
- Readable
- Consistent
- Contextual
- Future-friendly
- Extensible
- Reusable
- Brief (short/succinct)
All of these seem like very reasonable suggestions. Using this as a guideline, we broke down our initial draft standard into three sections separated by hyphens:
section1-section2-section3
This format consisted of sections to define the project name, project purpose, and the framework or language. An example of what this might look like would be:
project1-restapi-python
Each of these sections could be further hyphenated if two words needed to be split, for example, “rest-api”.
Debating conventions
Having drawn up a semantic-based naming structure that drew on my own experience and general research, I proposed it to the team. After some back and forth debate on the structure we eventually settled on the format that consisted of:
section1-section2
It was decided by the team that including the framework or programming language wasn’t that useful, as many repositories mixed languages. For example PHP and JavaScript.
This format was thus the product/project name and the repository purpose. For example:
project1-rest-api
While this works for our customer, which is what matters, I couldn’t get rid of the nagging feeling that we were missing an element of meaning from the naming convention. Perhaps this wouldn’t work if we applied it to other companies and scenarios? Now there might not be one true convention to rule them all, but I was certainly interested in doing a bit more research. This would involve gathering data from individuals across roles at Modus who use GitHub to understand the zeitgeist. How do they name repositories? What’s customary in their open source circles?
It was time to fall back on the good old-fashioned method of a questionnaire.
Measuring the Zeitgeist
My first step in coming up with a questionnaire was to dig through the shelves in my library to find an old copy of “Survey Methods in Social Investigation” by Moser and Kalton. I forget how I ended up with this but remembered it was gathering dust, so perhaps it was time to crack it open and put it to use.
If I was going to investigate the problem, I wanted a questionnaire that was as rigorous as possible when using a small population size and coming from a background that lacked any formal training in sociology or linguistics.
The important things to capture were: enough data to derive useful information from the feedback, ensuring a diverse set of job roles, and ensuring that the questions weren’t leading (even though the questionnaire involved a small subset of all possible naming conventions).
The first step was to define an objective. I settled on “To understand which semantic naming format from a small subset of available formats was preferable to the broadest group of GitHub users”.
Next was defining the target population versus the survey population. The target population was all Modus employees who use GitHub. As the survey was to be sent out to the whole company, providing a mechanism that individuals could self filter themselves out of the results was important, namely starting with the questions “Do you use GitHub?”.
We could exclude individuals who answered No from further questions.
The survey population would ultimately be the subset of staff who not only used GitHub but were in a position to actively participate in the survey. Staff who were on vacation or had time constraints would fall outside of the survey population.
As many Modus team members wear multiple hats, a multi-choice question was offered, allowing them to select their job roles. The categories were:
- Project Management
- Product Management
- DevOps
- Backend Engineering
- Frontend Engineering
- Design
- Technical Support and SysAdmin
- Information Security
- QA
- Management.
Having ascertained what role the individual plays, the next multiple option question was around the naming formatting and separators. If the individual has answered no to using GitHub, the questionnaire would end for them.
We asked respondents to chose which of the following separators they preferred in repository names:
- Hyphens (-) e.g. my-repo
- Underscores/Snake Case (_) e.g. my_repo
- None e.g. myrepo
- Camel Case e.g. myRepo
- Pascal Case e.g. MyRepo
GitHub does not support spaces in the repository name (but does in the display name), so we excluded them from the list.
Next was conventions around what information should be included in the name itself, i.e., what information should the separators be separating? Based on our research, we presented three options and asked respondents to rank them from 1 to 3.
These were as follows:
[product/project name]-[purpose]-[framework/language] e.g. myproject-api-rails [product/project name]-[purpose] e.g. myproject-rest-api [language/framework]-[product/project] e.g. python-security-scripts
If the respondent found none of the options useful or wished to provide further feedback, they had a text field to do so.
The questionnaire as you can see was simple. The user was asked to rank three options but no preference over which format was considered best by me (or prior research) was provided. If the respondents didn’t find any option useful, they could explain why.
Results
In total 68 individuals responded to the questionnaire. The breakdown of job roles respondents performed was:
Out of these 68 people, 88.2% were GitHub users, and 11.8% weren’t.
Having excluded the 11.8% from the follow-up questions, the 60 people that fell into the category of staff members we were interested in presented us with the following information.
In response to the question: “Which separator between words in a repository name do you prefer? Note this is the repository name, not the display name.” we discovered the following breakdown.
The overall results here show that hyphens are by far the most popular separator. This gelled with what I had seen on the web during my research.
Following this, the results for the proposed naming conventions were reviewed: As we can see 50% of our respondents ranked this as 2, so this option was neither strongly liked nor opposed. Only 15% ranked this a 3 and equally another 15% a 1. This overall was the most neutral of the naming conventions. The second option was far less popular with 60% of respondents ranking this a 1. Only 15% ranked this a 3 with the remaining 25% giving it a neutral rating. The final option had similar results to our first proposed convention, with marginally more respondents ranking a preference for this (28.8% giving this a 3 versus 25% for the first convention).
Conclusion
Compiling the results, we can see that hyphens were by far the most popular convention for separators, with 81.7% preferring them. Option three was marginally more popular than option one, so either of these options would likely be acceptable to a project team as long as we used it consistently.
To sum up, convention three would likely be more suitable for open source projects, where defining the language is useful, as an individual may be looking for a Python tool or JavaScript tool. On the other hand, convention one may be better suited to a project team or department where multiple products exist and are made up of sub-components, such as microservices.
This small research project for the blog has provided some valuable insights into how people approach repository naming conventions and how teams think about consistency on projects.
There may be more than one correct approach, but what is important is that teams pick a convention that works for them and stick to it. Overall, this helps maintain repository hygiene and helps engineers deal with dozens of repositories to find what they are looking for quickly.
Of course, naming conventions do not end here with the repository. Adopting a semantic approach for your GitHub organizations and tagging is another area to consider and one we will explore in future blog posts.
As an official GitHub partner, Modus Create provides consulting to support rollout and adoption on both private GitHub Enterprise servers and public hosting. If you’d like to take control of your installation and achieve Git mastery, talk to Modus.