“startups - are you on a monorepo or multi-repo set up?”
Summary of results
I use monorepo approach all the time in my work, it works great for me, the disadvantages is when it comes working with others, a lot of my peers get confused working with it, it also isn’t the best for say granular access compared if you have everything in separate repos.
> 1. One app - one repo.
There is a U-shaped utility to monorepos. Most projects fall in the middle somewhere, largely ones managed by independent engineering teams that have no centralized platform/devops/whatever-we-are-calling-it-these-days team.
I maintain the web monorepo at Uber, so I think I can give some context.
Monorepos allows us to centralize important dependency upgrades. E.g. fixing log4j vulns is a lot easier when you can patch everything simultaneously. Same for tzdata (2022g gave very little heads up) Auditing for npm supply chain attacks was a lot simpler in monorepo than microrepos. Etc.
Monolithic version control doesn't have to mean monolithic everything.
Our web projects can be deployed independently of each other, and we leverage tool like yarn workspace focus and bazel for granular installs and builds/tests/etc.
It doesn't have to mean monoversions either. We support multiple version of libraries, though we prefer coalescing them as much as possible to facilitate effort centralization. Finding out that your library change will break downstreams before you land the change is a feature.
We had microrepos before and the main problem is that to this day I still get some random team coming to me for help w/ some rediscovered 7 year old repo that doesn't even build anymore cus lockfiles weren't a thing back then.
At a large enough org, you'll inevitably see the full spectrum of team quality, from the really good teams to the one intern/contractor getting thrown into the deep end of some unloved ancient thing. You want a common denominator that lets you do things like patch vulns in unstaffed projects.
I've done fairly large migrations both to and from monorepos. Each has pros and cons. For us and companies like Google, monorepos work well with our organization model. For others it may not.
Having worked at a place with a monorepo it was one of the things I actually really liked about working there. It took a lot more tooling but I found it to be much better than a multi-repo would have been, even with a similar amount of tooling. I didn't have to be checking different projects out all the time and changes across projects weren't too bad (large scales ones still took a lot of managing but not as much as changes across a multi-repo).
I was always hoping that one of the big cloud providers would offer a monorepo and invest a lot in making tooling for it to actually be usable.
To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.
If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.
The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).
Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.
TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.
Think they’re using a monorepo?
The company I work for has recently transitioned from many small repos to a single monorepo. I can tell you that the development experience is MUCH better. 99% of issues required cross-repo coordination, which was a nightmare, and thing could easily get out of sync.
With a monorepo, you don‘t need to think about which commits work with which other commits. One commit ID is a full description of every subcomponent.
Guess it depends on your definition of a mono-repo vs multi-repo. I'd consider what we have as a mono.
We have one repo which is our main web application (user dashboard, landing page, etc..), our API, and our scheduled tasks. With how much code is shared between these services it just makes sense to keep them together.
We then have separate repo's for other services that aren't critical or apart of what was mentioned above.
What are you looking for in a monorepo solution?
Out of curiosity, has your org ever evaluated a monorepo approach?
Would be interesting to know what issues you faced considering monorepo and what pushed to have multiple repos.
Use a monorepo, but organize your code as if it will someday be split into many repos.
We use monorepos in my company too, but we use one for each product, and it works really well for that. But for the whole company to be in a single monorepo just seems like it would require a lot of effort to maintain.
Not at a startup, but at a small company (<10 developers) we regretted not doing a mono-repo.
We had our own GitLab instance and we were all open source enthusiasts and building micro-ish services, so it seemed natural for us to do multi-repo.
Eventually we realized it was creating a lot of unnecessary overhead, as we often were submitting patches for a single ticket to multiple repos which all required separate reviews.
All the pitfalls of a monorepo can disappear with some good tooling and regular maintenance, so much so that devs may not even realize that they are using one. The actual meat of the discussion is – should you deploy the entire monorepo as one unit or as multiple (micro)services?
Do you have a monorepo? Test infrastructure?
Started to use a monorepo + worktrees to keep related but separated developments all together with different checkouts. Anybody else on the same path?
The past few companies I was at, we discussed whether we wanted a single or multiple repos. But that was a separate conversation from microservices, so I don't think its unusual to have a monorepo with microservices.
I am myself am skeptical of monorepos, and i have only worked with polyrepos, but I am curious what strategies you use that help make you effective with multiple code bases?
I prefer monorepo for tightly coupled systems (like a game server and client), and multi repo for multiple, loosely coupled systems (like a web API and mobile app). The main problem with monorepos is that without the resources to build proper tooling, you end up with useless commit logs and losing track of context while working, but once you're on a larger team the advantages may be better. However, if you have multiple teams working on different, a multi repo might be better; for example, if you have a team for your API and a different team for your website, then multiple repos might be better depending on how coupled they are.
Either way, it doesn't really matter that much. In the end, it's mostly just a personal preference.
I strongly prefer the simplicity of a monorepo, but I once worked on a project that used three repos, and kept them in sync by having IntelliJ keep the branches in sync. Make a new branch, and you make it in all three branches simultaneously. Switch branch, and you switch in all three. That made it very convenient.
The project I'm currently working on just switched from polyrepo to monorepo. Interestingly, front and back end were in a single repo, but there was another repo with a bunch of definitions and datatypes, and a third with a frontend component library that was meant to be shared with another team, but that never happened. And that just made development really awkward.
I think polyrepo only makes sense if you actually have multiple teams with clearly separated responsibilities. But then each team still effectively works monorepo, don't they?
Two monorepos:
Primary monorepo – single versioned packages for libraries and services that are deployed as a whole.
Secondary monorepo – individually versioned packages for shared libraries and independent (micro)services.
Working on a monorepo where we have hundreds (possibly thousands) of projects each with a different version and release schedule. It actually works quite well, the dependencies are always in a good state, it's easy to see the ramifications of a change and to reuse common components.
The problem isn't per se monorepo v multi-repo, it's ensuring that your lines of communication between components, ie your APIs, are shared reliably and coherently.
When teams take the stability & versioning of their APIs seriously, the need to use monorepos to share that info is greatly reduced. A multi-repo approach is perfectly feasible when all components are working to established APIs, which also alleviates the issues mentioned in the article.
I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.
In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.
IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.
I think its worth calling out that there are different types of monorepos.
For example, I've worked in a monorepo that was one giant binary, but I've also worked in a monorepo that was a single repo that contained 4 ish independent services ( but were all in a single git repo ).
Having been down the route of repos for every service I would always choose monorepo in the future. I could see separate repos for libraries. There is just too much overhead trying to manage multiple repos. With a single repo it's possible to build a package that represents all of your software vs being forced to version everything. Tasks almost always touch multiple services unless you are so big you have a team per service.
As with many words, monorepos can mean different things to different people.
Some people use it for a repo where a single product is developed and typically deployed/published together, but happens to be distributed through separate packages. The Babel repository would be one example of this: https://github.com/babel/babel
Many “design system monorepos” fall in this category as well.
I would say the difference between one team building a single product in one repo or many might be interesting to some, but it’s a completely different problem from a multi-product multi-team repository. At some companies this would be a single repo for all source code.
Building software like this can have a profound impact on your entire engineering culture - for the positive, if you ask me. The single-product monorepos are unlikely to have a similar impact.
In our case we abandoned individual repos and went back to a monorepo to solve this issue. In theory the separation of code was nice, but in practice it was a real pain when a service added new APIs you wanted to update another service to use it.
All of our services do also print out in their startup logs what version they are based on git branch name and commit. Monorepo or not this was useful.
The only problem I have with a monorepo, is that sometimes I need to share code between completely different teams. For example, I could have a repo that contains a bunch of protobuf definitions so that every team can consume them in their projects. It would be absurd to shove all of those unrelated projects into one heaping monorepo.
If everything is tightly coupled (which it sounds like from your other comments), then go ahead with a monorepo. Should be fine.
I feel like I've read about several big companies using monorepos, but I've never understood why. It feels like the source-control equivalent of writing your code in one big file.
Does anyone have any good resources for why and how best to implement a monorepo?
monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.
As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..
Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.
The point of a monorepo is that all the dependencies for a suite of related products are all in a single repo, not that everything your company produces is in a single repo.
I've worked at companies with monorepos and it pushed me pretty hard in the opposite direction. My philosophy has been to go through the packaging and dependency process early and often, and split off as many things into their open repo as possible.
Meta also has a massive monorepo accessed primarily through cloud devservers.
When several of the world’s most successful software companies use this approach, it’s hard to argue that it’s inherently bad. Of course it’s sensible to discuss what lessons apply to smaller companies who don’t have the luxury of dedicated tooling teams supporting the monorepo and dev environment.
I'm sorry to break it to you, but monorepos are extremely common. Doesn't mean they have to be as large but every company I've been at had a monorepo.
And as soon as you have to manage PRs for multiple repos with a new cross-cutting feature or scheduling changes in the correct order you understand why they are so appealing.
The process for rolling out breaking API changes is the same for monorepos as it is for multi-repos since, during a deploy, multiple versions of each service will be running simultaneously. The only advantage of a mono-repo is the atomic commit across multiple services. It's definitely possible through a combination of convention and tooling to do something similar with a multi-repo, but as of yet this is a less explored paradigm.
I like the concept of a monorepo, but have found it challenging to implement because most developers are only responsible for their part - and there is often a big productivity benefit to keeping them narrowly focused. One trick, has been to have a monorepo for CI, rather than a monorepo for code. When one of the smaller packages gets updated the CI monorepo is triggered and all of the systems are tested for interoperation. Github makes this particularly easy with repository dispatches. It's been a wonderful "canary in the coalmine" for early problem detection. Bonus: The monorepo for CI becomes deployment documentation and can easily have its own set of tests that are specific to interop.
I'm shocked that there are so many repos. To be sure, are these all really separate repos and not just directories?
I've only worked at monorepo companies, and when I see the "monorepo vs. multiple repo" debates, I always picture in my mind that we're arguing about 1 vs. maybe 5 or 6 repos--like a repo for each major project. But thousands of repos, one for every little nugget??? That is totally wild. Is this an actual industry practice?
In a multi-repo setup you can upgrade gradually though, tackling the services that need the upgrade the most first. Can you do that in a monorepo setup?
It's kind of funny that the wisdom in software development is to program against interfaces and not implementations.
And yet here we are with monorepos, doing the big-ball of mud approach.
I've worked on several multi-repo systems and several monorepos. I have a weak preference for monorepos for some of the reasons given, especially the spread of pull requests, but that's almost a 'code smell' in some respects.
Monorepos that I've contributed to that have worked well: mostly one language (but not always), a single top-most build command that builds everything, complete coverage by tests in all dimensions, and the repo has tooling around it (ensure code coverage on check in, and so on).
Monorepos that I've contributed to that haven't: opposites of the previous points.
Multi-repos that have worked well: well abstracted and isolated, some sort of artefact repository (nexus, jfrog, whatever) as the layer between repos, clear separation of concerns.
Multi-repos that have not worked well: again, opposites of the previous, including git submodules (please, just don't), code duplication, fragile dependencies where changing any repo meant all had to change.
Re: monorepos, I think we're talking about 2 different things. I usually hear the term "monorepo" discussed in the context of how it is practiced at places like Google and Facebook: having the code for all the company's services (micro or not) stored in a single source control repository.
A monorepo really doesn't have anything to do with how code components are deployed - your comment seemed to be contrasting a monolith architecture with a microservices one.
On this Star Wars day, another take on monorepos!
But on earth, we have seen now several instances where teams have moved from polyrepo setup to a monorepo. Although "monorepo vs polyrepo" is always a debated topic, and it's hard to scale a monorepo, large companies like Stripe, Canva, Cruise, Doordash have been able to manage monorepos by building strong tooling and automation to handle the scale.
I would call that a micro-service architecture using a mono-repo.
I think a lot of people here are conflating mono-repo/poly-repo with a mono-deployment. You can easily add in extra entrypoint executables to a single mono-repo. That allows initiating parts of the system on different machines, allowing scaling for skewed API rates across your different request handlers.
Similarly, you can create a monolith application building from a poly-repo set of dependencies. This can be easier depending on you version control system, as I find git starts to perform really poorly when you enter multi-million SLOC.
At my job we have a custom build system for poly-repos that analyzes dependencies and rebuilds higher level leaf packages for lower level changes. Failing a dependency rebuild gets your version rejected, keeping the overall set of packages green.
Apologies; I mean to say Monorepo; and specifically the idea of having one repo with multiple services deployed independently.
Multi. Always.
The question with monorepos is not if they will become a nightmare. It is when they will become a nightmare.
Making a multirepo feel like a single workspace is trivial.
Making a part of a monorepo feel like a repo from a multi repo is impossible.
My experience with monorepos is that they are excellent if, and only if, you have a team dedicated full-time to making sure the repo remains sane.
This is true for any programming language. (Also, successful monorepos can be polyglot.)
If you don't have a dedicated team, you will eventually end up with all the downsides of a monorepo and few of the benefits. Builds will break frequently, impacting many teams. Dependency management will become a nightmare.
Open-source tooling like Bazel will only get you so far -- you will need in-house tooling too, but more than that, you will need an in-house culture of behaving well in a monorepo. Unless most of your engineers have done it before, you will need strong leadership to build that culture.
If you can't dedicate a team to that purpose and really follow through with it, then don't even try having a monorepo. Do a repo per team, or a repo per project.
One thing I don't understand about monorepos is that do people just stay on one platform and check in binaries? Or is it assumed that everything must be compiled and correct. I get that a branch can be compiled, tested and integrated, but how does that work with multiple teams. I mean at what point does it become like week-long builds to make sure everything is accurate and correct.
Or is monorepo more of a "place to put all the code" not necessarily correct or working.
I like multiple repos because it's easier to assume that the main branch of each is "correct and tested and excellent quality".
> If your organization puts all its code, and all assets that go along with that code into a single repository; you’ve got yourself a monorepo.
I'm not sure I agree with this. I suppose in the most technical sense, sure, but it's not really true.
We have a single repo with a bunch of microservices in it. Builds/tests are localized to a single microservice though. The beauty of git is that two people can work on two parts of the repo pretty much independently. So while technically there is only one repo, I feel like calling it a monorepo would just confuse people.
There’s also project repos where all the code pertaining to a project is under one repo. I think this setup is the best of both worlds.
If you’re a team that has a client, several micro services, DB, etc it’s way better to have that under a single repo than spread to multiple. Monorepos don’t have to be gigantic monstrosities, they can encapsulated products.
What if they become managed by separate teams? Or two projects in separate repos become managed by the same team? What about a service that basically everything else in the company relies on (for Google, accounts and auth for example).
Better to just keep things in a monorepo IMO, even if they seem unrelated.
If there are multiple teams committing to the same repo you need controls over who has permission to commit to which directories and maybe a policy for handling merge conflicts across teams. I'm not sure what the tooling is like around that, but I could see the benefits as long as someone very high up was on board and had enough of a technical mind to keep order.
As far as two repos becoming one or one repo becoming two, you can split and merge repos while keeping the commit history.
edit:
Do you have experience working somewhere with a monorepo that stretched across multiple teams? if so, what was it like?
In this case, I'd say it's the opposite: monorepo as an approach works amazingly well for small teams all the ways up to huge orgs (with the right tooling to support it).
The difference is that past a certain level of complexity, the org will most certainly need specialized tooling to support massive codebases to make CI/CD (build, test, deploy, etc.) times sane.
On the other hand, multi-repos may work for massive orgs, but is always going to add friction for small orgs.
It doesn't matter if you have a mono-rep or multi-repo, you will need engineers on tooling to make it work if your project is large. There are pros and cons to both multi-repo and mono-repo with no one right answer (despite what some will tell you). They are different pros and cons, but which is best depends on your particular context.
What's the difference if they are in one or two repo if they produce two artifacts that are separated? You *will* have network calls between the two, unless you are marrying yourself with a deployment/operational platform that can run the two artifacts together. (ok, there could be a few but I really don't see how this is just using a "monorepo" instead of a "multirepo")
Monorepo is just one small part of the puzzle. If you want to actually achieve the dream state that is alluded to when someone says "monorepo", you have to be willing to endure a super-deep and honest evaluation of your tech stack and processes.
We have been successfully running a monorepo for ~5 years now using nothing more than the bare-ass GitHub PR process with some basic checkbuilds sprinkled on top. We intentionally avoided getting our hands dirty with CI automation as much as was feasible. Building our software manually is so simple its not really worth dedicating a team to worrying about.
I would say the biggest key to our success was finding a way to minimize 3rd party dependencies. By making certain strategic technical choices, we were able to achieve this almost by default. Our repository is effectively hermetically sealed against outside changes in all of the ways that matter to the business owners. Github providing issues at the repository grain is also a very powerful synergy for us in terms of process - we started using issue numbers as the primary key for all the things.
With regard to technical choices - consider that some languages & ecosystems provide a lot more "batteries included" than others, which provides a substantially different landscape around how many 3rd parties may need to be involved and how.
I have my doubts about the mono repo approach. From the top of my head:
1. It might increase the complexity of other process on that repo: CI/CD configuration, makefiles, branching strategies, codeowners etc
2. The versioning might lose its meaning
3. Blast radius in case of screwing things up accidentally.
I personally split repositories based on responsabilities: here the code base, here the iac, here the configuration, here the manifests. Always using a standard and predefined naming convention for the repository names. That being said, as always, it depends. I might embrace the monorepo if the context demands it and it has been properly discussed and evaluated.
One repo is better because you can basically include the entire change set in 1 PR / gitref.
Downside is it requires you come up with a structure and a bit of discipline to maintain coherence.
If you use codeowners and your CI can include other files, then most of the downsides go away, besides the repo size itself.
Multirepo is a bit easier to understand per item. You can have one single CI file, and don't need codeowners, but you move a lot of burden to test, deploy and run.
If your SCM supports codeowners and CI supports includes, I'd stick closer to mono-repos, and only break out things when it's helpful a D you obviously wouldn't change together.
Think services rather than micro.
At sufficient scale, the problems of a monorepo tend to be more of a fixed overhead cost whereas those same problems at scale in a multi-repo approach tend to impact all engineers across the company. The fixed overhead can be addressed by hiring a dedicated team to manage the monorepo problems.
For multi repo I will need to build automation to manage all the repos and enforce a consistent experience across them, including syncing the repos, if we end up using stuff like submodules. And I need to do this now. We tried to "trust" every repo owner to do the right thing, but it was a cluster fuck.
With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.
I agree with the other commenter - in my view, a monorepo is the _best_ choice for a small company. I guess this depends on what tooling is available for your language / ecosystem of choice though. In my experience of TypeScript and Java with monorepos, you definitely need to know how to configure the tooling properly (which is certainly "overhead"), but it massively reduces the maintenance cost and increases the consistency of your tooling config. Spreading out over loads of repos means you need to share artefacts, which means package managers and package manager hosts, and a whole suite of release CI/CD which gets out of sync almost immediately.
It's also getting a lot better, gradle works amazingly well for a monorepo even with dozens of developers committing to it every day with shared caching, nx/turborepo/others are making the story for front-end/TS much better too.
Organized as in... kept in different repos?
People live/die on this hill and it makes no sense. One or many repos, it's just an organizational question, not some Super Serious Big Decision.
I work at a small, early stage startup and I'm about to create a new repo for our Slack bot. I'm going to use the Docker image another of our repos generates as the base Docker image for this new repo, and it's going to be just fine.
Assuming you're just referring to repos: not really IMO.
As soon as you split 1 repo into 2 repos you need to start building tooling to support your 2 repos. If your infrastructure is sufficiently robust with 2 repos then you might as well have 3 or 4 or 10. If it's built to _only_ support 2 repos (or 3 or 4) then it's brittle out of the gate.
The value of a monorepo is that you completely eliminate certain classes of problems and take on other classes of problems. Classic trade off. Folks that prefer monorepos take the position that multirepo problems are much harder than monorepo problems most of the time.
Monorepo is not necessarily synced deployment, and even if it was, each deployment of a single component is usually racy with itself (as long as you're deploying to at least two nodes).
Which means that you've got to do independent backwards-compatible changes anyway, and that for anything remotely complex, you are better off with separate branches (and PR/MRs) anyway.
Monorepos mostly have a benefit for trivial changes across all repos (eg. we've decided to rename our "Shop" to "Shoppe"), where it doesn't really take much to explain with multiple repos, but is mostly a lot of tedious work to get all the PRs up and such.
Nobody is going to write a blogpost "we use multiple repos and it's ok for us". And that is the majority of companies.
Heck, even most individual developers use multiple repos.
Monorepos are usually ok (and even talking about team/project granularity) until you outgrow it.
Buildsystems like one-repo-per-project much better than monorepos. (Shipping a fix doesn't require rebuilding "the whole world" for example)
I relate so much with this article
I've had a previous experience with a backend monolith repo, and these days I have to deal with a backend that has 20+ repos. It is hell. Duplicated code, duplicated logic, duplicated tests, duplicated settings, an hell to introduce newbies to the architecture, async calls to external basic APIs that could've been just simple method calls.
I think that the one major disadvantage of having a big monorepo, is that with those multiple entry points, you might end up with a bunch of unused dependencies. But even that is manageable I think: you can have different package dependencies definitions whilst using the same codebase.
I've always worked with small teams (max up to 5 or 6 developers) and that's another point in favor of monorepos. I understand that big companies might want to have different teams working on different repos, for organisation reasons.
A question for people here saying "use a monorepo", and coming from a different direction than that of the article. Say I want to use a monorepo for all code I write for personal use and development, but I'll have a folder with dozens of projects cloned from github often with small tweaks and custom branches. Is the solution submodules? Saving patches or just code-snippets in the monorepo and keeping the random misc repos isolated? Hardlink specific files of interest?
One giant codebase is fine. Monorepo is better than lots of scattered repos linked together with git hashes. And it doesn't really get in the way of each team managing when stuff gets rolled out.
At least google and meta are heavy into monorepos, I'm really curious what company is using a _repo per service_. That's insane.
What's the value of a monorepo if developers only ever check out a small subset of it? Wouldn't multiple repos allow greater scale without any practical reduction in utility?
For example, all the localisation files could live in a separate project (if we accept the need to commit them at all). Some tools would be needed to deal with the inevitable problem that developer working sets would not align with project boundaries, but that seems like an easier job than making git scale while maintaining response times.
How is a mono repo the simple solution compared to one repo per independently releasable component ?
All the tooling is much easier to use when each application has its own repo.