GitHub is the place to be for working on source code. It’s packed with tools that help you write code well, share it with your friends, and work together on a code base. In this tutorial, you’ll learn how to get started using GitHub, as well as where to go to find further information.
What’s so great about GitHub? Basically, it’s an easy way to use Git. Git is the core of a lot of activities when developing software, so let’s start with an example of why it’s so great. I’ll then introduce the basic concepts used when working with Git, followed by a short walk through of how to get started at GitHub. At the end, you’ll find some material where you can find more information. Git is a large topic and many people work with it for years without feeling like they’ve truly mastered it, so be patient with yourself and stay open to learning new things!
The need for Git: An example
Let’s say Alice is working on some software and Bob wants to help her out. She might copy the software onto a USB stick and give it to Bob. Bob works on the software, puts it back on the stick, and gives it to Alice. This works for a while, but it becomes problematic when Alice starts writing code while Bob has the stick. When he gives the stick to her, she can’t just copy the project onto her computer – it’ll overwrite her own changes! And going through Bob’s code and finding the changes, then merging them back into her code by hand, is very tedious and prone to errors.
The problem becomes more complicated when not only Alice and Bob, but more of their friends start working on the code, and it quickly becomes unsolvable. Throughout the years, many different solutions have been tried out, and the most successful solutions make use of so-called Version Control Systems (VCS). VCS solve a lot of the problems that arise when people work on software at the same time. The most popular one is Git, and for good reason.
Git has many advantages – people can work on the same files at the same time, it doesn’t require a server, so it’s easy to set up, and since most operations are done offline there’s no waiting for the network to sync things up, etc. This means that you can describe the work that you’ve done flexibly and whenever you like. It also scales well to large projects, and because each save point in a project’s development – referred to in Git parlance as a “commit” – is cryptographically secured, it keeps your code from being corrupted by hardware failures or even malicious tampering. That’s why Git is used to do all kinds of things, from making robotics software to developing the Linux kernel, from developing databases to massive multiplayer online games. We use it at Full Stack Embedded for just about everything.
GitHub is an excellent platform for working with Git because of the many tools it offers you in addition to what Git is capable of out-of-the-box. We recommend GitHub as a way of storing, securing and sharing your code.
Let’s get started with some concepts about Git. These apply no matter how you’re interacting with Git – via the command line, in your IDE, or within GitHub. Each of these commands is described in greater detail in the Git reference manual. The Git book also contains excellent resources on how to use Git in the real world and is very approachable and easy to read.
In Git, a repository is a folder containing the files which belong to a project. Generally the files and directories found in a repository are all that is needed to compile, test and install the software being developed within the project.
A repository can contain both files that are “tracked” by Git, as well as files which Git ignores. If Git is tracking a file, it makes it possible for changes to that file to be traced back through time. Typically, all source code files and any test files are tracked, while files which are generated automatically from source codes, such as compiled binaries, shared objects, etc. are not tracked. These may be generated in the course of your work within the repository, but because they are not tracked, they are not shared with anybody else who also works on the same project. Normally this is a good idea because the generated files might be system-specific and only work on the machine that they were created on.
When Alice wants to share her code with Bob, she initializes a repository in the directory which contains or will contain her source codes. By initializing the repository, Git begins tracking the contents that she adds. Bob can get a copy of the repository by cloning it. This creates a local copy of the repository on his computer that he can work with. When Bob has cloned Alice’s repository, the Git on Bob’s computer knows where Alice’s repository is located because it keeps track of her repository as a remote. If at a later time Alice wants to see Bob’s work, she will need to add Bob’s repository as a remote as well.
Add / remove
After Bob clones Alice’s repository, he works on the code by
- changing existing files, e.g. by creating new functions, extending existing methods, etc.,
- creating new files and
- deleting files which are no longer needed.
Git tracks the state of each file in the repository, but it only records changes that Bob makes if he explicitly marks those changes. This keeps compiled files which do not need to be tracked, as well as mistakes, from becoming part of the repository’s history and being shared with all the other people working on the project. Bob marks these changes by adding them, or, in the case of deleted files, by removing them. If he doesn’t explicitly do these things, his changes can’t be made visible to Alice and applied in her repository.
After adding or removing files from his repository, Bob wishes to make his changes visible to Alice. In order to do this, he creates a commit. A commit creates a permanent snapshot of all the tracked items in a repository, as well as noting which changes were made since the last commit. It’s secured by creating a hashsum of the repository’s contents, which become’s the commit’s name, and by adding a commit message, which Bob has to supply. The commit message describes the changes made between the commit Bob is making and the previous commit, which was the base for his work up to that point.
Commits are also useful for grouping changes in a repository which belong together logically.
The state of a repository is frozen at each commit and can thus be recovered at any time in the future. This is done by checking out the commit. When a commit is checked out, all files within the repository that Git tracks are changed (added, removed or modified) to reflect the state of the repository from that commit. This allows Alice or Bob to browse through the repository’s history without having to worry about losing changes that have been made since a given point in the past.
When Bob is finished on working on his code, or when he wants to show his work to Alice, he pushes his code to his repository. In the simplest case, the repository is the original repository owned by Alice, which Bob cloned his project from. If Bob has write access to this repository, pushing to it will upload all of the commits to the repository so that his commits are available to Alice and anybody else who can read from the repository.
Pushing works well for a centralized workflow, in which there is only one central repository which everybody works on together. However, other workflows are available, which are based around pulling. We’ll explore this concept in a bit.
When Bob tells Alice that he’s completed a new feature in the project, of course she’ll want to be able to review his changes – and maybe even deploy them! In order to do this, she needs to fetch his changes. This is done by making sure that her repository is connected to Bob’s repository. Alice’s repository understands this connection by tracking Bob’s repository as a remote, meaning that Alice’s repository knows where Bob’s repository is located, and that Alice can read from this location. If this is the case, Alice can perform a fetch.
A fetch is when Git reads from another repository and compares the commits it finds there. Any new commits are registered and copied into the current repository. When fetch is performed, no files in the current repository are changed, but it becomes possible to check out commits that were recorded on the other repository or to merge these commits into the code base.
Let’s imagine that Alice and Bob are working at the same time, using a central repository to read from. Each of them have cloned this central repository to their local computer and are working there. After they have finished a feature, they intend to push their changes to from their local repository to the central repository so that the other developer can access their changes.
Because Alice and Bob are working at the same time, eventually Alice pushes her changes to the central repository first. Now, when Bob tries to push his changes, it doesn’t work – Git refuses to write an insequential commit history. Since Alice’s commits aren’t in Bob’s history, it is Bob’s responsibility to merge her changes into his code before pushing back to the central repository.
When Git performs a merge, it does its best to apply the changes recorded in the commits in diverging histories into a single, combined history. This works well if no commits in either of the histories Git is merging touched the same parts of the same files. In this case, Git can complete the move via fast forward, meaning that the changes are simple applied to the code in the order that they were made in real life. If this is not the case, conflicts arise, as multiple commits made different changes to the same parts of the same files. If a conflictis found, Git marks where it happened. So if Alice changed the same code lines as Bob did, he has to resolve the conflict by opening the files which contain the conflicts, checking them and removing the conflict markers, and then adding them to his repository. After having resolved all conflicts, he can perform a commit so that nobody else has to do this work again.
Pull is simply shorthand for performing a fetch followed by a merge. This is often the easiest way to update a local repository to the most recent version of the central repository.
Now Alice and Bob have done quite a bit of work on their project, so much so that they have a stable first version. Because of this, they decide to divide up the work into individual work packages which they will work on separately. These work packages may contain multiple commits and they should be independent of each other. Alice runs a tight ship and would like the stable version of the code to always be obvious, so she doesn’t want these intermediate commits to show up at the top of the repository’s history, perhaps exposing broken code, until a feature has been completely implemented and is stable.
A good strategy to do this is to use branches. A branch is made when a repository is split into multiple lines of development. The main line is almost always called master, while other branches are named so that their purpose is obvious – e.g. next-version, feature/add-manned-flight, bugfix/use-only-metric-system, etc.
Branches can be treated exactly like other repositories, in that work contained in a given branch can be merged into another branch at any time, regardless of how many commits separate them. This is very handy for Alice, because now she can ensure that master stays stable and is only updated all at once when a new feature is implemented, tested and verified.
How do I use this with GitHub?
Creating a repository
Before you get started with GitHub, you have to sign up for a username. Don’t worry, this is easy and GitHub will guide you through the process. Then create a new repository to work with.
You’ll be asked some questions.
Afterwards, your repository is set up! You can clone it to your local machine and get to work.
The simplest way to get started now is to grant write access to the other collaborators in the project so that they can also push to the repository. Alternatively, they can fork the repository for themselves, clone their own verison of the repository, and send you a pull request when they’re finished implementing something which should be included in the project.
There are many workflows for working with Git, and this tutorial barely touches the surface of what’s possible. Check out these pages as a starting point for getting additional information on Git.
- Official Git documentation
- Different workflows using Git
- Rebasing, rather than merging, pull requests for a cleaner commit history