n0toose

Why You Should Probably Move Away From GitHub

by n0toose, 24 Apr 2022

It's not because of ideological reasons. (I'm definitely too lazy for that sort of thing.)

I'm moving my open-sourced code from GitHub to Codeberg. But why would anyone do that?

Background #

How did GitHub make an impact? #

Okay, so, I'm going to start off this blog entry with an oversimplified explanation of what sort of gap GitHub managed to fill.

GitHub started as an easy place to put your code for the whole world to see. It is powered by Git. Git is a free "Distributed Concurrent Version System" (in short, DCVS). Everyone can have their own copy of the source code of a program, while keeping track of changes that were made to the code and by whom. It was initially created by Linus Torvalds. Other alternatives existed and still exist, but Git ended up gaining popularity, mostly because it was free (as in "beer" and as in "freedom") and easy/flexible enough to use. However, using Git could still be hard for a beginner. You would have to use a command line, which is quite complex, and, for some projects, you would have to configure your email client so that you'd be able to send proposed changes using email. Because of how flexible it is, that isn't a necessity anymore, and you can just use a website ("forge") that makes everything easier by abstracting it away for you.

As far as forges are concerned, one of them that were being used a lot back in the day was SourceForge. It may have arguably been the best way to get people to use your software back then, but, from my experience, it wasn't quite that perfect for working with other people on software. I did try, but I never really ended up doing that. Nowadays, after being acquired by a dozen different companies so far, it's filled with ads from providers like Google, and they even bundled software together with adware at some point. I mean, someone in this chain does have to make money somehow, but I'm just trying to illustrate that the experience got worse than it was earlier. GitHub, on the other hand, eventually won the hearts of a lot of people that worked on open-source code because it made basic tasks like requesting for a change that you worked on to be included into a project, or creating your own copies of a project that you can modify as you wish, as simple as pressing one, two, or three buttons. That's a huge difference. Other alternatives existed at the time, but GitHub was arguably the nicest.

Nowadays, other prominent alternatives also exist, such as GitLab, Bitbucket, SourceHut and Gitea. The alternative that I am moving to is Codeberg, which is based on Gitea.

Why are you ditching GitHub, then? #

GitHub will not always be able/willing to protect you. #

In 2021, there were two cases of takedown requests due to copyright reasons that generated quite a lot of buzz in open-source communities. A takedown request from Take-Two Interactive Software, Inc (UPDATE 2022-06-24: Or arguably illegal, for example: Directive 2009/24/EC in EU law, not a lawyer. However, a port of Super Mario 64 has not been taken down yet, despite Nintendo's reputation for going out of their way to protect their intellectual property), the publisher of the Grand Theft Auto video game franchise, against an arguably legal remake of one of their games, as well as the takedown request for the tool youtube-dl. These two cases generated a huge amount of outrage in open-source communities, but this is not what I would talk about. To GitHub's defense, they are a company with headquarters situated in the United States and operated by a company in the United States, so, it really isn't as simple as keeping it available. Still doesn't make it any less of a problem.

The latter program allows you to download just about all videos on YouTube and other websites, which also includes copyrighted music, while including a example of how to download a Taylor Swift song. Nevertheless, this situation was resolved, and GitHub made a blog post about how they heroically clicked the button on some admin portal to restore the repository and created a fund for it and implemented additional procedures in order to protect the developers that use their platform.

More recently, Russian users that used to work for companies that are now sanctioned as a result of the Russo-Ukrainian war have had their accounts suspended and repositories removed from the public, as a result of US sanctions. GitHub's centralized structure causes real data loss to open-source projects.

There was also a controversy between GitHub and the security research community at some point, as they prohibited people from sharing code that could be used to exploit other machines. Although I cannot easily find sources to back what I'm about to say, people essentially argued that this was harmful, hurting openness and the ability for vendors to fix the issue and that this decision was weirdly serving the interests of companies that did not take the necessary steps to protect their infrastructure, to the detriment of everyone else. Thankfully, they got more lax about this later on.

I think that it would be better for everyone if source code repositories were hosted on a federated model, even with the (primarily moderation-related) flaws that come with it. Thankfully, Gitea is working on that. (The only thing that's slightly concerning about this is how a lot of code may be lost, but that's another story.)

GitHub forces you to use GitHub. #

Since GitHub's bottom line depends on a lot of people using it, despite on whether they are paying customers or not, I think that it's logical to be a bit cautious about the way they operate their business.

Microsoft argued to the European Commission that it would not have the ability to prevent people from migrating to competing platforms (see section 5.4.2.2 of the document I just hyperlinked), because GitHub just uses an open protocol (namely, Git, in case you haven't realized) and not just some proprietary API for it. However, GitHub doesn't always utilize Git "by the book".

Trailers #

For example, Git has a built-in feature called "trailers", which allow you to store structured information in your commits. For example, Co-authored-by: lets you declare if you authored a change in your code with someone else. Reference-to: also lets you declare if your change fixes a particular issue that you would rather backlink instead of describing the entire issue.

In the latter case, GitHub just tends to use numeric references to issues and merge requests. For instance, if I were to merge a pull request called "Fixing a bug", GitHub would direct me to use "Fixing a bug (#123) by default, where #123 is a reference to an incrementing number assigned to each new "Issue" and "Pull Request". On the web UI, #123 turns into a hyperlink which leads to the corresponding pull request. However, if you use the command git log -p to see an overview of the recent commits, the number #123 is going to appear in plain text, without any additional context. Oh, and to make things worse: If you choose to include the link (which is probably a good strategy, as some sort of service like the Internet Archive may retain a recoverable copy of the pages's contents in case the page itself disappears), the web UI will just condense the URL down to #123 for the people that use the web UI, so that they will probably think that it's normal to just use numbers and not the full URLs.

Remember GitHub's Arctic Code Vault, where they buried copies of many open-source projects in the Arctic for people that may need it up to a thousand years from now? I think they presumed that GitHub would straight up exist anymore by then. Well, these people will probably pretty annoyed, unless if they also preserved a copy of all of these issues and pull requests. (I doubt it.)

If you want to review a project's history, you have to go on GitHub. Considering that GitHub decided to deviate from the standards to be user-friendly in a way that ultimately benefits, well, (you guessed it!) GitHub, I think that it isn't extreme to assume that GitHub is basically holding the history of your entire, collaborative projects hostage, as a large part of it will disappear if you decide to take down from off GitHub. I'm wearing my tinfoil hat on this one.

This sort of practice also worked out badly for one of my projects called Tenacity, which was a modified version of Audacity. Commits which were originally made on Audacity's repository containing references to issues and pull requests in the commit titles and messages redirected to our own repository, and not Audacity's. This means that if there was a reference to the issue #1 of the Audacity repository, it would always redirect to the issue #1 of our repository, as GitHub would not be able to understand the difference. I also tried to force others to use the Reference-to: trailer at some point for a project that I was working on with them, but most of them ultimately did not and even refused to do so, simply because it's objectively way too inconvenient and idiosyncratic. I probably turned a bit unpleasant with that too, sorry guys.

Unfortunately, other competitors seemingly attempting to imitate GitHub's ease of use have also implemented similar solutions to a problem that was already solved two decades ago.

GitHub Actions #

Remember that part in the Bible where some snake convinced Eve to eat some forbidden fruit, and then she in turn convinced Adam to also eat it, and long story short they were angrily banished to the living hell we call "Earth" today? Pretty sexist honestly, definitely a product of its time. Anyways, if you use GitHub Actions for anything, which GitHub makes very easy, it's very hard to move away from it and use something else. Now, I'm not going to argue that GitHub is anti-competitive simply because they have a pretty good product. The real problem here is that you should also use GitHub completely if you want to take advantage of that service. One would assume that this is completely normal, but there are other, much smaller competitors that have done this differently. SourceHut, for instance, managed to create a similar service that allows you to use it regardless of whether or not you use SourceHut to host yuor code repositories on. It just goes on to show that things don't have to be the way they currently are.

GitHub Copilot #

One of GitHub's most controversial moves as a result of its acquisition is its collaboration with OpenAI, that Microsoft is also heavily cooperating with. It's an AI that basically writes code for you. I've tried it, it is pretty awesome to some degree. Felix Reda wrote an extensive blog post arguing that it is not illegal and the arguments that he is making here are absolutely correct. On the other hand, however, its weaknesses are definitely a reason for concern, especially because at some point, it would just paste copyrighted source code for you without maintaining the conditions of the license. The most basic one being "Give me credit if you use my code". In its infancy, if you asked it to write a fast inverse square root algorithm, it would just print out John Carmack's implementation for it. (This was his response.)

I have been responsible for authoring a relatively complex project that also happens to be very original as of recently. And I am not saying that to brag, but if I were to remove something in a function, no matter the size, GitHub Copilot would then recommend me to use the code that I previously wrote to solve the problem that I did not manage to solve with the code that I previously wrote. (Phew, that was a mouthful.)

That definitely was a cause of concern to me, as I do not trust it enough to not get me in legal trouble. And even if I do, it's not like I can easily prove that it was GitHub Copilot's fault and, therefore, I am not responsible. And even if I do, I already agreed to GitHub's Terms of Service which essentially force me to indemnify GitHub in the event of a legal dispute. That seems to work out for them just fine!

UPDATE 2022-06-24: I tried using GitHub Copilot on a large, arguably very original project of mine. I noticed that when I deleted a specific part of my code to replace it with something else, GitHub Copilot recommended what I had written there earlier verbatim, if we are to exclude slight stylistic modifications. So, one could argue that it can also work as an AI-based Undo function that comes with a linter, provided that it works correctly. Yay.

GitHub has different interests from the ones of its users. #

Who would represent my interests better? If I had to choose between a billion-dollar company which is a part of a half-trillion dollar company with its own set of unique interests that has me pay them to some degree so that I can access a larger set of features that allow me to be more productive, and a non-profit organization that just gives me everything I need, which I can also literally just pay to be a part of and have voting rights that decide how it is being managed, the choice seems pretty clear to me. It's the same as choosing between a lawyer and a group of like-minded people that do the same things as you do. The goals between me and GitHub may coincide, and they may have significantly more resources, but the intentions are not the same. Oh, and considering that the first analogy involves lawyers, so the latter option seems much more attractive by default, even if they may not have as many resources.

I fully acknowledge that GitHub intends to target the enterprise sectors for the most part, and I mean no offense meant towards anyone working on making GitHub a better platform to work with, but these said interests also seem to end up affecting how the product is ultimately being designed.

By the way, not to bring up the elephant in the room here, but isn't it kind of silly how GitHub is not even open-source to begin with? I mean, come on, the biggest reason why it has any value in the first place is because most people prefer GitHub, and not necessarily because it is groundbreakingly innovative these days. In one case where GitHub does come up with a cool piece of technology, GitHub Actions, they chose to not make it open source. They probably did that because there was already a market full of tools that basically did the same things, but that made it so cumbersome to work with, to the point where someone just ended up emulating it anyways. It was very well received too, the developer is being sponsored by Mercedes-Benz as of the time of this writing (It could be because of other projects that he has worked on, but I am just making a logical leap here because that one is the most popular one), which goes on to show how much of a necessity that project was.

Lackluster moderation tools #

I ended up being one of the people in charge of a project that was deteriorating. Long story short, the previous maintainer got in a lot of trouble with a couple of trolls, and he handed over the helm to someone else without asking. We ended up carrying on with it for a good while, but GitHub did not help or provide the tools required for controlling the situation.

The sinking ship in question was called Tenacity, which got coverage on outlets like the German IT news site heise and The Register. Most of the news coverage the project got was fairly inaccurate, but that's another story.

Many people were basically using bug report forms that we used to develop our project as if it were a comments section on Reddit to intimidate us, even after the person they were attacking to begin with departed our project. That was our workspace, hence the reason why it got in the way of development. Many of us were not experienced with large-scale community management specifically on GitHub. Some team members took moderation decisions, had their said decisions publicly documented with their names, and not on behalf of the organization. Transparency is normally preferrable and I am a huge proponent of it, but these rules cannot always continue to apply when bad actors are weaponizing it to threaten the existence of the project.

It was not possible to provide moderators with the ability of acting "on behalf of the organization". That meant that all the other people tackling other problems had to waste their time on this. It also isn't possible for moderators, as in, the people responsible for maintaining a safe and constructive environment for development, cannot impose restrictions, such as needing to have an account that's more than 2 hours old in order to participate in discussions. What could possibly go wrong if a moderator was alerted before making a decision that could affect them personally without them knowing, or if they temporarily controlled whether people who made an account like, an hour ago, get to comment on stuff? I had heard before that "GitHub's tools were not designed with abuse in mind", and, even if I cannot remember their arguments in the back of my head, I can see how that's true now. It's as if they consider this ability as destructive as controlling repositories and source code, so they basically force you into assigning additional responsibilities and dangerous permissions to people.

Small post mortem of Tenacity
That situation will probably never make sense to me -- consider this exaggerated example: You can't just enter a grocery store, where, say, a child murderer used to work at, and then cause even more of a ruccus when you are told to leave after you start screaming at the employees over it! Note: I'm using the example of a child murderer in order to portray how extreme that sounds, even in the hypothetical case of a child murderer. But hey, it's the Internet, so it shouldn't have to make sense. Oh, also, I was unpaid.

We were doing all of this voluntarily purely for the benefit of the general public, and we also had severed ties with the previous developer that the trolls were mostly targeting in the first place. Meep.

Some people even made new repositories, which they spammed with annoying and sometimes intimidating messages that were cluttering up our inbox which was more than full already from managing a project that had gone (sort of) viral. Well, I guess that's a thing for GitHub to deal with. However, GitHub's reactions to reports (up to 3 days of response times, although this happened during a company-wide holiday), and when they did, they made a decision that was trying to be neutral/laissez-faire, even on counts of bigotry directly targeted towards individual contributors, but we could not always control the situation as much as GitHub could. We were also trying to deal with unexpected media attention.

We ultimately ended up pushing through most of the difficulties that we faced. Many, if not most, were highly motivated, shared a common goal and had a lot of projects with good reputation in our portfolios. However, the working time that was taken away from development, largely because of GitHub's inadequate moderation tools and late responses at that time, was most likely more than a month. This ultimately resulted in most of us getting burned out, losing interest and moving on to other projects.

So, does GitHub suck? #

Surprisingly enough, GitHub isn't actually the epitome of all evil in the world right now. Even with its negatives, their user support tends to be very good in most cases, even if you're not a company that is actively paying them for it. If you try to report a problem and ask them if they can have an engineer take a look at this, they will most likely do so, as they have done that with me in the past. They also don't try to extract every single, last penny out of you to access basic features.

Small things like that are basically unheard of for most Silicon Valley companies with billion dollar valuations nowadays, I would assume that they would "lose money" by not providing good customer service in the long run. After all, GitHub managed to turn into the place for the development of open-source software and wants to say that way. It probably isn't a surprise that it ended up getting sold to Microsoft for $7.5 billion US dollars. That probably goes on to show that it is the easiest way to collaborate with other people on code, or at least has been for a long time.

Conclusion #

I think that I have field-tested the platform enough to see why this is not working out for me very well.

Who knows? Maybe I will have to move back to GitHub in the long-term future. It's not like I can realistically completely get rid of it, as I will have to use it for most projects if I want to contribute to them. And it's not like all of the alternative options solve most of the problems that I brought up. Copyright laws that could be abused exist in other places other than the United States as well. And just because I used GitHub long enough to understand its weak points, that does not mean that these weak points do not also exist elsewhere. But to be honest, this is more about who I can trust here.

I will still have to use GitHub as I haven't migrated all of my projects away yet, and because many other projects still use and require you to use GitHub for collaboration. I hope that this will improve in the future.


← Homepage