“startups - are you on a monorepo or multi-repo set up?”
Summary of results
What are you looking for in a monorepo solution?
Out of curiosity, has your org ever evaluated a monorepo approach?
Would be interesting to know what issues you faced considering monorepo and what pushed to have multiple repos.
Use a monorepo, but organize your code as if it will someday be split into many repos.
The company I work for has recently transitioned from many small repos to a single monorepo. I can tell you that the development experience is MUCH better. 99% of issues required cross-repo coordination, which was a nightmare, and thing could easily get out of sync.
With a monorepo, you don‘t need to think about which commits work with which other commits. One commit ID is a full description of every subcomponent.
Having been down the route of repos for every service I would always choose monorepo in the future. I could see separate repos for libraries. There is just too much overhead trying to manage multiple repos. With a single repo it's possible to build a package that represents all of your software vs being forced to version everything. Tasks almost always touch multiple services unless you are so big you have a team per service.
Guess it depends on your definition of a mono-repo vs multi-repo. I'd consider what we have as a mono.
We have one repo which is our main web application (user dashboard, landing page, etc..), our API, and our scheduled tasks. With how much code is shared between these services it just makes sense to keep them together.
We then have separate repo's for other services that aren't critical or apart of what was mentioned above.
We use monorepos in my company too, but we use one for each product, and it works really well for that. But for the whole company to be in a single monorepo just seems like it would require a lot of effort to maintain.
Not at a startup, but at a small company (<10 developers) we regretted not doing a mono-repo.
We had our own GitLab instance and we were all open source enthusiasts and building micro-ish services, so it seemed natural for us to do multi-repo.
Eventually we realized it was creating a lot of unnecessary overhead, as we often were submitting patches for a single ticket to multiple repos which all required separate reviews.
Do you have a monorepo? Test infrastructure?
The past few companies I was at, we discussed whether we wanted a single or multiple repos. But that was a separate conversation from microservices, so I don't think its unusual to have a monorepo with microservices.
I am myself am skeptical of monorepos, and i have only worked with polyrepos, but I am curious what strategies you use that help make you effective with multiple code bases?
I prefer monorepo for tightly coupled systems (like a game server and client), and multi repo for multiple, loosely coupled systems (like a web API and mobile app). The main problem with monorepos is that without the resources to build proper tooling, you end up with useless commit logs and losing track of context while working, but once you're on a larger team the advantages may be better. However, if you have multiple teams working on different, a multi repo might be better; for example, if you have a team for your API and a different team for your website, then multiple repos might be better depending on how coupled they are.
Either way, it doesn't really matter that much. In the end, it's mostly just a personal preference.
I strongly prefer the simplicity of a monorepo, but I once worked on a project that used three repos, and kept them in sync by having IntelliJ keep the branches in sync. Make a new branch, and you make it in all three branches simultaneously. Switch branch, and you switch in all three. That made it very convenient.
The project I'm currently working on just switched from polyrepo to monorepo. Interestingly, front and back end were in a single repo, but there was another repo with a bunch of definitions and datatypes, and a third with a frontend component library that was meant to be shared with another team, but that never happened. And that just made development really awkward.
I think polyrepo only makes sense if you actually have multiple teams with clearly separated responsibilities. But then each team still effectively works monorepo, don't they?
The problem isn't per se monorepo v multi-repo, it's ensuring that your lines of communication between components, ie your APIs, are shared reliably and coherently.
When teams take the stability & versioning of their APIs seriously, the need to use monorepos to share that info is greatly reduced. A multi-repo approach is perfectly feasible when all components are working to established APIs, which also alleviates the issues mentioned in the article.
I think its worth calling out that there are different types of monorepos.
For example, I've worked in a monorepo that was one giant binary, but I've also worked in a monorepo that was a single repo that contained 4 ish independent services ( but were all in a single git repo ).
I use monorepo approach all the time in my work, it works great for me, the disadvantages is when it comes working with others, a lot of my peers get confused working with it, it also isn’t the best for say granular access compared if you have everything in separate repos.
As with many words, monorepos can mean different things to different people.
Some people use it for a repo where a single product is developed and typically deployed/published together, but happens to be distributed through separate packages. The Babel repository would be one example of this: https://github.com/babel/babel
Many “design system monorepos” fall in this category as well.
I would say the difference between one team building a single product in one repo or many might be interesting to some, but it’s a completely different problem from a multi-product multi-team repository. At some companies this would be a single repo for all source code.
Building software like this can have a profound impact on your entire engineering culture - for the positive, if you ask me. The single-product monorepos are unlikely to have a similar impact.
In our case we abandoned individual repos and went back to a monorepo to solve this issue. In theory the separation of code was nice, but in practice it was a real pain when a service added new APIs you wanted to update another service to use it.
All of our services do also print out in their startup logs what version they are based on git branch name and commit. Monorepo or not this was useful.
The only problem I have with a monorepo, is that sometimes I need to share code between completely different teams. For example, I could have a repo that contains a bunch of protobuf definitions so that every team can consume them in their projects. It would be absurd to shove all of those unrelated projects into one heaping monorepo.
If everything is tightly coupled (which it sounds like from your other comments), then go ahead with a monorepo. Should be fine.
I feel like I've read about several big companies using monorepos, but I've never understood why. It feels like the source-control equivalent of writing your code in one big file.
Does anyone have any good resources for why and how best to implement a monorepo?
The point of a monorepo is that all the dependencies for a suite of related products are all in a single repo, not that everything your company produces is in a single repo.
> 1. One app - one repo.
There is a U-shaped utility to monorepos. Most projects fall in the middle somewhere, largely ones managed by independent engineering teams that have no centralized platform/devops/whatever-we-are-calling-it-these-days team.
Think they’re using a monorepo?
I've worked at companies with monorepos and it pushed me pretty hard in the opposite direction. My philosophy has been to go through the packaging and dependency process early and often, and split off as many things into their open repo as possible.
Meta also has a massive monorepo accessed primarily through cloud devservers.
When several of the world’s most successful software companies use this approach, it’s hard to argue that it’s inherently bad. Of course it’s sensible to discuss what lessons apply to smaller companies who don’t have the luxury of dedicated tooling teams supporting the monorepo and dev environment.
The process for rolling out breaking API changes is the same for monorepos as it is for multi-repos since, during a deploy, multiple versions of each service will be running simultaneously. The only advantage of a mono-repo is the atomic commit across multiple services. It's definitely possible through a combination of convention and tooling to do something similar with a multi-repo, but as of yet this is a less explored paradigm.
I'm sorry to break it to you, but monorepos are extremely common. Doesn't mean they have to be as large but every company I've been at had a monorepo.
And as soon as you have to manage PRs for multiple repos with a new cross-cutting feature or scheduling changes in the correct order you understand why they are so appealing.
I like the concept of a monorepo, but have found it challenging to implement because most developers are only responsible for their part - and there is often a big productivity benefit to keeping them narrowly focused. One trick, has been to have a monorepo for CI, rather than a monorepo for code. When one of the smaller packages gets updated the CI monorepo is triggered and all of the systems are tested for interoperation. Github makes this particularly easy with repository dispatches. It's been a wonderful "canary in the coalmine" for early problem detection. Bonus: The monorepo for CI becomes deployment documentation and can easily have its own set of tests that are specific to interop.
I'm shocked that there are so many repos. To be sure, are these all really separate repos and not just directories?
I've only worked at monorepo companies, and when I see the "monorepo vs. multiple repo" debates, I always picture in my mind that we're arguing about 1 vs. maybe 5 or 6 repos--like a repo for each major project. But thousands of repos, one for every little nugget??? That is totally wild. Is this an actual industry practice?
In a multi-repo setup you can upgrade gradually though, tackling the services that need the upgrade the most first. Can you do that in a monorepo setup?
Re: monorepos, I think we're talking about 2 different things. I usually hear the term "monorepo" discussed in the context of how it is practiced at places like Google and Facebook: having the code for all the company's services (micro or not) stored in a single source control repository.
A monorepo really doesn't have anything to do with how code components are deployed - your comment seemed to be contrasting a monolith architecture with a microservices one.
On this Star Wars day, another take on monorepos!
But on earth, we have seen now several instances where teams have moved from polyrepo setup to a monorepo. Although "monorepo vs polyrepo" is always a debated topic, and it's hard to scale a monorepo, large companies like Stripe, Canva, Cruise, Doordash have been able to manage monorepos by building strong tooling and automation to handle the scale.
To me, Monorepos are about a separation of concerns (or lack thereof). Something is a monorepo if/only if there is greater than one concern in the repo. (For example mobile + web, backend + frontend, service + server).
When I was at FB, I remember someone used the term polylith to describe our multiple monorepo situation... I suspect that's probably a pretty common situation for people that "have a monorepo"
I would call that a micro-service architecture using a mono-repo.
I think a lot of people here are conflating mono-repo/poly-repo with a mono-deployment. You can easily add in extra entrypoint executables to a single mono-repo. That allows initiating parts of the system on different machines, allowing scaling for skewed API rates across your different request handlers.
Similarly, you can create a monolith application building from a poly-repo set of dependencies. This can be easier depending on you version control system, as I find git starts to perform really poorly when you enter multi-million SLOC.
At my job we have a custom build system for poly-repos that analyzes dependencies and rebuilds higher level leaf packages for lower level changes. Failing a dependency rebuild gets your version rejected, keeping the overall set of packages green.
Apologies; I mean to say Monorepo; and specifically the idea of having one repo with multiple services deployed independently.
Multi. Always.
The question with monorepos is not if they will become a nightmare. It is when they will become a nightmare.
Making a multirepo feel like a single workspace is trivial.
Making a part of a monorepo feel like a repo from a multi repo is impossible.
My experience with monorepos is that they are excellent if, and only if, you have a team dedicated full-time to making sure the repo remains sane.
This is true for any programming language. (Also, successful monorepos can be polyglot.)
If you don't have a dedicated team, you will eventually end up with all the downsides of a monorepo and few of the benefits. Builds will break frequently, impacting many teams. Dependency management will become a nightmare.
Open-source tooling like Bazel will only get you so far -- you will need in-house tooling too, but more than that, you will need an in-house culture of behaving well in a monorepo. Unless most of your engineers have done it before, you will need strong leadership to build that culture.
If you can't dedicate a team to that purpose and really follow through with it, then don't even try having a monorepo. Do a repo per team, or a repo per project.
One thing I don't understand about monorepos is that do people just stay on one platform and check in binaries? Or is it assumed that everything must be compiled and correct. I get that a branch can be compiled, tested and integrated, but how does that work with multiple teams. I mean at what point does it become like week-long builds to make sure everything is accurate and correct.
Or is monorepo more of a "place to put all the code" not necessarily correct or working.
I like multiple repos because it's easier to assume that the main branch of each is "correct and tested and excellent quality".
> If your organization puts all its code, and all assets that go along with that code into a single repository; you’ve got yourself a monorepo.
I'm not sure I agree with this. I suppose in the most technical sense, sure, but it's not really true.
We have a single repo with a bunch of microservices in it. Builds/tests are localized to a single microservice though. The beauty of git is that two people can work on two parts of the repo pretty much independently. So while technically there is only one repo, I feel like calling it a monorepo would just confuse people.
There’s also project repos where all the code pertaining to a project is under one repo. I think this setup is the best of both worlds.
If you’re a team that has a client, several micro services, DB, etc it’s way better to have that under a single repo than spread to multiple. Monorepos don’t have to be gigantic monstrosities, they can encapsulated products.
What if they become managed by separate teams? Or two projects in separate repos become managed by the same team? What about a service that basically everything else in the company relies on (for Google, accounts and auth for example).
Better to just keep things in a monorepo IMO, even if they seem unrelated.
If there are multiple teams committing to the same repo you need controls over who has permission to commit to which directories and maybe a policy for handling merge conflicts across teams. I'm not sure what the tooling is like around that, but I could see the benefits as long as someone very high up was on board and had enough of a technical mind to keep order.
As far as two repos becoming one or one repo becoming two, you can split and merge repos while keeping the commit history.
edit:
Do you have experience working somewhere with a monorepo that stretched across multiple teams? if so, what was it like?
It doesn't matter if you have a mono-rep or multi-repo, you will need engineers on tooling to make it work if your project is large. There are pros and cons to both multi-repo and mono-repo with no one right answer (despite what some will tell you). They are different pros and cons, but which is best depends on your particular context.
Monorepo is just one small part of the puzzle. If you want to actually achieve the dream state that is alluded to when someone says "monorepo", you have to be willing to endure a super-deep and honest evaluation of your tech stack and processes.
We have been successfully running a monorepo for ~5 years now using nothing more than the bare-ass GitHub PR process with some basic checkbuilds sprinkled on top. We intentionally avoided getting our hands dirty with CI automation as much as was feasible. Building our software manually is so simple its not really worth dedicating a team to worrying about.
I would say the biggest key to our success was finding a way to minimize 3rd party dependencies. By making certain strategic technical choices, we were able to achieve this almost by default. Our repository is effectively hermetically sealed against outside changes in all of the ways that matter to the business owners. Github providing issues at the repository grain is also a very powerful synergy for us in terms of process - we started using issue numbers as the primary key for all the things.
With regard to technical choices - consider that some languages & ecosystems provide a lot more "batteries included" than others, which provides a substantially different landscape around how many 3rd parties may need to be involved and how.
What's the difference if they are in one or two repo if they produce two artifacts that are separated? You *will* have network calls between the two, unless you are marrying yourself with a deployment/operational platform that can run the two artifacts together. (ok, there could be a few but I really don't see how this is just using a "monorepo" instead of a "multirepo")
I have my doubts about the mono repo approach. From the top of my head:
1. It might increase the complexity of other process on that repo: CI/CD configuration, makefiles, branching strategies, codeowners etc
2. The versioning might lose its meaning
3. Blast radius in case of screwing things up accidentally.
I personally split repositories based on responsabilities: here the code base, here the iac, here the configuration, here the manifests. Always using a standard and predefined naming convention for the repository names. That being said, as always, it depends. I might embrace the monorepo if the context demands it and it has been properly discussed and evaluated.
One repo is better because you can basically include the entire change set in 1 PR / gitref.
Downside is it requires you come up with a structure and a bit of discipline to maintain coherence.
If you use codeowners and your CI can include other files, then most of the downsides go away, besides the repo size itself.
Multirepo is a bit easier to understand per item. You can have one single CI file, and don't need codeowners, but you move a lot of burden to test, deploy and run.
If your SCM supports codeowners and CI supports includes, I'd stick closer to mono-repos, and only break out things when it's helpful a D you obviously wouldn't change together.
Think services rather than micro.
At sufficient scale, the problems of a monorepo tend to be more of a fixed overhead cost whereas those same problems at scale in a multi-repo approach tend to impact all engineers across the company. The fixed overhead can be addressed by hiring a dedicated team to manage the monorepo problems.
For multi repo I will need to build automation to manage all the repos and enforce a consistent experience across them, including syncing the repos, if we end up using stuff like submodules. And I need to do this now. We tried to "trust" every repo owner to do the right thing, but it was a cluster fuck.
With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.
I agree with the other commenter - in my view, a monorepo is the _best_ choice for a small company. I guess this depends on what tooling is available for your language / ecosystem of choice though. In my experience of TypeScript and Java with monorepos, you definitely need to know how to configure the tooling properly (which is certainly "overhead"), but it massively reduces the maintenance cost and increases the consistency of your tooling config. Spreading out over loads of repos means you need to share artefacts, which means package managers and package manager hosts, and a whole suite of release CI/CD which gets out of sync almost immediately.
It's also getting a lot better, gradle works amazingly well for a monorepo even with dozens of developers committing to it every day with shared caching, nx/turborepo/others are making the story for front-end/TS much better too.
Organized as in... kept in different repos?
People live/die on this hill and it makes no sense. One or many repos, it's just an organizational question, not some Super Serious Big Decision.
I work at a small, early stage startup and I'm about to create a new repo for our Slack bot. I'm going to use the Docker image another of our repos generates as the base Docker image for this new repo, and it's going to be just fine.
I maintain the web monorepo at Uber, so I think I can give some context.
Monorepos allows us to centralize important dependency upgrades. E.g. fixing log4j vulns is a lot easier when you can patch everything simultaneously. Same for tzdata (2022g gave very little heads up) Auditing for npm supply chain attacks was a lot simpler in monorepo than microrepos. Etc.
Monolithic version control doesn't have to mean monolithic everything.
Our web projects can be deployed independently of each other, and we leverage tool like yarn workspace focus and bazel for granular installs and builds/tests/etc.
It doesn't have to mean monoversions either. We support multiple version of libraries, though we prefer coalescing them as much as possible to facilitate effort centralization. Finding out that your library change will break downstreams before you land the change is a feature.
We had microrepos before and the main problem is that to this day I still get some random team coming to me for help w/ some rediscovered 7 year old repo that doesn't even build anymore cus lockfiles weren't a thing back then.
At a large enough org, you'll inevitably see the full spectrum of team quality, from the really good teams to the one intern/contractor getting thrown into the deep end of some unloved ancient thing. You want a common denominator that lets you do things like patch vulns in unstaffed projects.
I've done fairly large migrations both to and from monorepos. Each has pros and cons. For us and companies like Google, monorepos work well with our organization model. For others it may not.
Monorepo is not necessarily synced deployment, and even if it was, each deployment of a single component is usually racy with itself (as long as you're deploying to at least two nodes).
Which means that you've got to do independent backwards-compatible changes anyway, and that for anything remotely complex, you are better off with separate branches (and PR/MRs) anyway.
Monorepos mostly have a benefit for trivial changes across all repos (eg. we've decided to rename our "Shop" to "Shoppe"), where it doesn't really take much to explain with multiple repos, but is mostly a lot of tedious work to get all the PRs up and such.
Nobody is going to write a blogpost "we use multiple repos and it's ok for us". And that is the majority of companies.
Heck, even most individual developers use multiple repos.
Monorepos are usually ok (and even talking about team/project granularity) until you outgrow it.
Buildsystems like one-repo-per-project much better than monorepos. (Shipping a fix doesn't require rebuilding "the whole world" for example)
I relate so much with this article
I've had a previous experience with a backend monolith repo, and these days I have to deal with a backend that has 20+ repos. It is hell. Duplicated code, duplicated logic, duplicated tests, duplicated settings, an hell to introduce newbies to the architecture, async calls to external basic APIs that could've been just simple method calls.
I think that the one major disadvantage of having a big monorepo, is that with those multiple entry points, you might end up with a bunch of unused dependencies. But even that is manageable I think: you can have different package dependencies definitions whilst using the same codebase.
I've always worked with small teams (max up to 5 or 6 developers) and that's another point in favor of monorepos. I understand that big companies might want to have different teams working on different repos, for organisation reasons.
A question for people here saying "use a monorepo", and coming from a different direction than that of the article. Say I want to use a monorepo for all code I write for personal use and development, but I'll have a folder with dozens of projects cloned from github often with small tweaks and custom branches. Is the solution submodules? Saving patches or just code-snippets in the monorepo and keeping the random misc repos isolated? Hardlink specific files of interest?
One giant codebase is fine. Monorepo is better than lots of scattered repos linked together with git hashes. And it doesn't really get in the way of each team managing when stuff gets rolled out.
Having worked at a place with a monorepo it was one of the things I actually really liked about working there. It took a lot more tooling but I found it to be much better than a multi-repo would have been, even with a similar amount of tooling. I didn't have to be checking different projects out all the time and changes across projects weren't too bad (large scales ones still took a lot of managing but not as much as changes across a multi-repo).
I was always hoping that one of the big cloud providers would offer a monorepo and invest a lot in making tooling for it to actually be usable.
What's the value of a monorepo if developers only ever check out a small subset of it? Wouldn't multiple repos allow greater scale without any practical reduction in utility?
For example, all the localisation files could live in a separate project (if we accept the need to commit them at all). Some tools would be needed to deal with the inevitable problem that developer working sets would not align with project boundaries, but that seems like an easier job than making git scale while maintaining response times.
How is a mono repo the simple solution compared to one repo per independently releasable component ?
All the tooling is much easier to use when each application has its own repo.
Is it just me or most of those "monorepo" articles are about how the company switches to monorepo then just has to fight all the problems that come up with it?
Use multiple repos. It doesn't hurt you know. And you can have proper granularity
The costs of monorepo seem to be much higher than just dealing with inter-dependencies. And you can always group projects if it makes sense
> If done in a company with a monorepo I'd be especially interested in hearing more
Are there any big companies left which haven't adopted a monorepo?
Do people use a mono repo approach with different parts of the repo having different version numbers? The way I thought of it, the whole point of mono repo is not that.
Just curious, why is using a monorepo a useful trick? I would think it'd be better to have internal libraries that provide common functionality across services, and have a repo for each service. Otherwise, you're deploying code changes for one service that could, in theory, mess with another service that you don't maintain.
Two things.
One, teams that share a dependency version can still develop independently; they just share something in common. They already likely share other things in common: deployment target OS, cloud platform and services, shared authN/authZ frameworks, etc.
Two, a monorepo is just the SCM mechanism. As the parent was describing, it doesn't prescribe anything other than the code storage location and how branching, committing, etc. works. Yes, a lot of organizations prefer having a single version rule in their individual monorepos, but nothing about monorepos in general makes this a requirement. You can use multiple versions of the same dependency and still gain advantages from the single commit benefits, and even famous instances like Google's have exceptions where this is the case.
Monorepo comes with its own set of challenges. Git doesn't scale all that well but is the most popular and supported VCS.
Assuming you get as far as actually having code in one repo, the advantages of monorepo do not come for free. You either use a consolidated build system or you're still linking code using packages. A mono-build is no small task especially if your org is of any sort of complexity.
You'll almost certainly never get away from packages entirely unless you want to pull in the source code of all your dependencies. Not only are you merging it in but you're integrating it into your mono-build system. Doesn't sound feasible or enjoyable to me.
Some people consider mono-repo a fools errand. At most places you can take the pragmatic approach of consolidating repos where it makes sense while keeping the decoupling advantages of packages where it makes sense. They both have trade offs.
Most people use the "suite of related products" definition of monorepo, but some companies like Google and Meta have a single company-wide repository. It's unfortunate that the two distinct strategies have the same name.
Many-repo is better at places that can’t fund a large infrastructure team, mono-repo is better at places that can.
The natural state of a mono-repo is Twitter-like paralysis, it requires concentrated work to avoid that but that work can make them better than many-repo.
Believe it or not, there is not only one way to do a monorepo.
It's still possible to have different jobs for different projects, just like before, with some kind of build filter in front of them. Those different build jobs can be managed however they are now. This is common. There is no need for a single giant build mechanism that knows all the things.
Packages are a separate concern. And of course you should use versioned packages, just like you always have. Why re-invent a solved problem? Trying to force library upgrades in n-many services all at once, automatically, is a hard problem. Why invent that, too?
Repo consolidation really shines for me in infrastructure and testing - the things that can touch multiple services at once. That's stuff most devs aren't really involved in day-to-day. I think that a lot of the monorepo hatred comes from not understanding other people's problems.
How would you handle versioning in a monorepo? Just a new directory for major versions?
If we have multiple teams you can’t really refactor other teams code and sometimes you need to do breaking changes thus I imagine that some versioning must exist.
Most people arguing one way or the other are in such a small code base that they don't face the problems the argument is really about in the first place. If you only have a a million lines of code then by all means use a monorepo as you probably aren't facing the downsides that make people reach for a multi-repo solution anyway.
Monorepos do require upkeep beyond that of single-product repositories. You need some form of discipline for how code is organized (is it by Java package? by product? etc). You need to decide how ownership works. You need to decide on (and implement) a common way to set up local environments. Crucially, you need to reevaluate all these decisions periodically and make changes.
On the other hand... this is all work you'd have to do anyways with multiple repositories. In the multi-repo scenario, it's even tougher to coordinate the dev environment, ownership, and organization principles - but the work isn't immediately obvious on checkout, so people don't always consider it.
Regarding auditing, I have always found that having all the code in one place is tremendously useful in terms of discoverability! Want to know where that class comes from? Guaranteed if it's not third-party, you know where it is.
Not to minimize the pain of poorly-managed monorepos - it's not a one-size-fits-all solution, and can definitely go sideways if left untended.
I have worked at places that have micro-repos, places that have polyrepos, and places that have monorepos. From my observation monorepos are a sign of ossification and resistantce to new patterns and new technologies.