Maximillian Laumeister
Illustration of Miku the bird-fox shouting through a megaphone while flying in the sky

GitHub Takeout

In 2020, there still isn’t a great way to back up a GitHub account.

2021 Edit: It’s been pointed out to me that GitHub allows you to request an archive of your data, including repositories, issues, pull requests, etc. This is exactly the feature I was hoping for! To download a copy of your GitHub data, visit the Admin settings page on your GitHub account. The original article is preserved below.

This morning on Hacker News, I read about GitHub user yg - an active maintainer of the Gatsby project - whose account was disabled for no apparent reason. He says that it’s been nearly a week since GitHub suspended his account, and that in the mean time he can no longer review the 3-6 pull requests that he receives per day.

Another HN user chimed in, saying that his GitHub account had been “disappeared” too, a while back:

Even as a paying customer for many years, my account was disabled – without even receiving an email warning. I only discovered when browsing issue histories where I knew I’d left detailed comments, and noticing my comments gone without even a note about deletion, leaving threads nonsensically fragmented.

I’d paid them ~$600 over the previous 5 years, and still had an active subscription with working billing details. My account was nearly a decade old with a wide variety of contributions & comments. But still, an automated system with no apparent human review disappeared my account, without even generating a notification.

Gojomo on Hacker News

That got me thinking that I should probably back up my GitHub repos, just in case. I’m not the only one in the thread with the same idea:

Always do a periodic full off-site cold backup of all your GitHub repos. It’s very easy to script and should compress down very small.

Smoothgrammer on Hacker News

Unfortunately it’s not really “very easy” to script a full account backup, unless you’re talking about just the bare repos without any of the GitHub metadata. Another user asks if there’s a script available. Here are the responses to that comment, in summary:

  1. There’s a script that almost does this, but only downloads forks (link)
  2. I use a VPS running Gitolite to mirror repos on a repo-by-repo basis (link)
  3. There’s a GitHub action that will push a repo to GitLab/BitBucket every 30 minutes (link)
  4. We just pay for a commercial service to do this (link)
  5. There’s a script that does repos, but doesn’t do any of the metadata. (my own comment, link)

Mysteriously missing from that list is a way to just back up your entire account including its metadata in one go, like a GitHub version of Google Takeout.

Looking to internet search, we can find a few scripts for backing up GitHub repos, but none of them tick all the boxes.

Rodw’s backup-github.sh

Rodw’s backup-github.sh was the most promising script at first, with 262 stars. Though it does back up repositories only - it doesn’t include issues, pull requests, comments, or other social data.

Unfortunately when I tried it, it started by trying to back up a repository that is not in my GitHub account but that I am a collaborator on, then aborted on the 404 it got trying to find that repo in my account.

I could probably get it working by taking 15-30 minutes to fork the script and make it ignore collaborating repos (bash is not one of my best languages), or I could sort through the 123 current forks of the script to see if anyone else has done the same, but it shouldn’t be that hard to get a simple backup of my GitHub repositories.

Joey’s github-backup

Joey’s github-backup is easy to install since a copy lives in Ubuntu’s default apt repository.

Unlike the other tools, this one backs up “everything Github knows about the repository, including other forks, issues, comments, milestones, pull requests, and watchers.”

You are meant to run it in a git repo that you’ve already cloned, and then it only backs up the metadata. So in order to get a full backup of your GitHub account, you would have to combine this script with one of the other backup scripts to be able to grab the actual repos themselves.

Unfortunately, though, it doesn’t support private repos, so it cannot help back up your account completely.

Clockfort’s GitHub-Backup

Clockfort’s GitHub-Backup is the one that ended up working the best for me. Armed with my GitHub personal access token, I ran the following command:

python3 ./github-backup.py -v all -a owner -m -f -p 757f37bd9695132937b1da1516de18e8310f954e maxlaumeister backup

(don’t bother checking, it’s revoked)

And I got a copy of all my account’s repos, which I then zipped up to store with my local backups. But it doesn’t include any of the social data - again, the issues, comments, wikis and pull requests associated with my repos.

Conclusion

The trickiest part of any backup process is verifying that all the data you expect to be backed up is actually backed up. Because of that, GitHub account backup should be a first-class feature tested and endorsed by GitHub themselves, not a janky process of cobbling together multiple third-party scripts and trusting them to cover all the data you wanted.

Much like Google Takeout, I should be able to go into my GitHub settings page and click a single button that zips up my entire account (repos and social data) and hands it to me. Even Facebook and Twitter have a button for that. So why doesn’t GitHub?

For all of Google’s issues, their Google Takeout tool does account backup right and is a pleasure to use. In comparison, backing up GitHub is all stinky. GitHub needs to get with the times and make a GitHub Takeout.

More Articles Tagged #tech

Comments