Web security — exposed .git folder in production

You have a frontend application which gets deployed to production, for millions of people to use. The source code is private, you have your development workflow all sorted out, the build and release process is working fine without any glitches, but one day you get a vulnerability report from a researcher that your source code can be recreated because your .git folder is reachable through the production URLs, like https://myapp.com/.git/ . Now you may or may not have directory listing enabled, so the attacker might not be able to see the contents of this folder, but as we will see, that does not really matter to the attacker.

How can you end up in a situation like this?

This can usually happen due to an incorrect deployment. If you have a way to manually deploy the build code, for example, if you use s3 for static web hosting, and anyone can upload the build code in the s3 bucket, then someone might end up copying the .git folder to the s3 bucket. S3 web hosting would make it available in production on the corresponding path. Now you can argue, that during manual deployment, it is also likely that someone just copies the whole src folder, but that is still less likely, because, in general, developers understand the src folder and the implications of exposing it, but they might not understand the importance of the .git folder.

what is the .git folder?

The .git folder is where git keeps all its metadata. Inside this folder is information about all the commits, branches, files, and everything related to your git repository. This folder alone can be used to recreate your whole git repository with the history, files, and everything ever committed to git. So, if an unauthorized person gets hold of your .git repository, they can recreate your private codebase.

Why is this a big deal? Isn’t the javascript code anyway available through the browser dev tools or in the generated bundle files?

True, but that still is very limited information. For one, it is not the source code, it is the built code, which is a lot different from your source code. You source code might contain tests, stories, dev dependencies, dev only code. You might not have the strictest guidelines for dev only code. For e.g, you might have some credentials hardcoded in your tests, or some zeplin token hardcoded in your storybook setup, or may be you have to call an api as part of the build process, which requires auth which is hardcoded in your webpack config. All of this can be recreated/exposed if your .git folder is exposed. Also, the source code might have other files with sensitive information which are not part of the build, but are checked in to git.

How can someone recreate the codebase using the .git folder ?

The folder has a standard directory structure, so, the attacker would already know where to look for what. This folder is used by git to recreate the repository at the time of git clone.

contents of the refs/heads
objects folder inside .git
  • This would give us the hash for each and every file, that was part of the commit.
  • Now, we can just use cat-file to get the contents of each file.
  • This process can just be automated and the whole source code can be generated.
content of the HEAD file inside .git

what is the solution?

  • improve your deployment process, only ever deploy the build files and nothing else, automate it, restrict access for manual deployment
  • never assume that your source code can not end up in production or exposed. So never have secrets in the source code for example.