Beginners guide to teamwork in Git
My first encounter with Git was a disaster. I worked on a marketing intelligence project to standardize our model code. We asked a team of data scientists whether they had any useful code to share with us. Instead, they ‘helped’ us by introducing Git without too much explanation. A few days of struggling later, we were drowning in a sea of branches and merge conflicts and had barely written any code.
Luckily, this was not my last experience with Git. The next project my team came prepared with a neat collaboration structure and we are reaping the rewards ever since. This is what you need to consider if your team of coders is considering to collaborate on a project using Git.
What is Git
Git is basically an online folder structure with files filled with your code (e.g., Python files, notebooks, etc.), easily shared and updated. Git offers a great way to ensure everyone has the same version of code without the hassle of sharing and implementing snippets of code with your colleagues. The main purpose of Git is version control, which means you can easily access previous versions of the project. Git is also very useful for code deployment.
Choose your Git
There are a few variations out there, most popular are Gitlab and Github . Both offer private repositories for free. We chose Gitlab for our project. You can find some extensive documentation here and here. Whichever platform you choose, you need to install GIT on your computer.
How to use Git
Git is mainly used from the command line, where you use specific commands to synchronize the changes you make on your computer with the changes others make. You can send (“push”) the changes you made locally to the shared online environment, you can download (“pull”) the changes others have made to your computer, and Git makes sure these changes are automatically integrated without interference.
Choose a collaboration structure
If you use Git to work together on a project, i it is useful to think about how to add your individual contributions to the project without overwriting other code or breaking down the project as a whole. Especially with programming projects, there is a chance you can ruin a working project by introducing bugs when you push your code, a good collaboration structure can prevent that.
A common approach is to use different branches (i.e. versions of your project) for different purposes. A branch is a copy of project code, meant to be used as a specific version. For example, a master branch with the latest production-proof version of the project and a development branch that can be used to add and test new features to the code. On our last project we used the following structure (based mainly on this and this article) as depicted in the figure below.
The way of working was as following:
- Create a feature branch by making a copy of the development branch. Name it after the feature you are going to build.
- Pull the feature branch to your computer.
- Add new code to the project.
- Push your feature branch to the online project.
- Create a merge request to merge your feature branch with the development branch.
- Peer review the merge request and merge it with the development branch. The development branch now contains your new code.
- Repeat step 1-6.
As soon as the development branch with the new feature is tested and stable, merge the development branch into the master branch. In next segment we’ll provide you with the necessary command line code to follow this structure.
Getting started
Now put your collaboration structure to practice.
- Make sure Git is installed
- Go to gitlab.com and add a project
- Add a development branch to your project
Clone the project
Every team member can clone this repository and contribute to it by opening a command prompt and type the following commands:
# go to the directory where you want to create a local copy of the project
cd your_directory
# copy the whole folder with the project from git to your computer
git clone https://gitlab.com/name_of_your_project.git
# go to the folder of your project
cd name_of_your_project
You now have a local copy of the project. From now on, we can just follow the steps in our way of working. In this example, you’ve named your git project ‘new_project’ and the feature you are working on will be named ‘new_feature’.
Create a feature branch
Go to gitlab.com/new_project and create a new branch ‘new_feature’. Of course, you should give it a more meaningful name that corresponds with the purpose of the feature. For example ‘imputation_function’. The purpose of the branch can also be something else than a feature. For example, a bug fix. If you already have a branch you are working on, just skip to step 2.
Pull the feature branch
Open a command prompt and type:
# go to your project directory
cd your_directory/new_project
# sync your project locally
git pull
# switch to the new feature branch
git checkout -b new_feature origin/new_feature
You now have a local copy of your new feature, currently still the same as the up to date development branch.
Add new code or change existing code in the project
Add/modify files in your project locally. For instance, in our project new features were often a piece of code (a new function) added to a python file. But it can also simply be some modifications to a README for instance. Make sure you test your new code (if applicable) and everything works as expected and no other parts of the code broke down due to the changes you’ve made.
Push your feature branch
As soon as your feature is complete, you can then commit and push your code. From this moment on, all your team members can see all the changes you’ve made in the feature branch. Next, share your new feature with your colleagues by submitting a merge request with the development branch. Your team can review the feature on gitlab.com and either accept or decline the merge request. First, push the changes you made locally to your feature branch to the shared online environment.
In your command prompt:
# add all changed documents (if you want you also only specifically add a single (set of) document)
git add .
# commit all the changes you made, don't forget to add a meaningful and short commit description
# each commit will be visible as a single change, the smaller the commit, the easier the rollback will be when something produces errors
git commit –m “added imputation techniques to data_cleaning.py”
# push all existing commits to the online version of your new_feature branch
git push origin new_feature
Create a merge request to the development branch
Check the Gitlab repository to check whether your branch is uploaded. When successful, click on Merge Requests > New merge request, add the description and choose to merge your new branch with the development branch. Delete the new_feature branch after merging (checkbox) to keep your project clean and prevent “stale” (old/unused) branches..
Peer review the merge request
After submitting, tag a colleague to review the request and either decline and provide feedback or directly merge your request into the development branch, thereby making it available in project.
You now have successfully contributed to the project! Hopefully…
Tips & tricks to smoothen the collaboration
Use .gitignore
Not every file in your project has to be shared and merged in Git. For example, if you use jupyter notebooks to test your code or have some datasets present, just place a file ‘.gitignore’ in your project and write down all the files and/or extensions Git should not synchronize. For example: ‘*.ipynb’ to ignore all jupyter notebooks in your main folder or /Data to ignore all files in the folder named Data.
Use merge requests
Creating merge requests is a neat way to be aware of all admissions of new code and a moment to peer review. To avoid that your teammates merge their branches instantly with development – or even worse, the master branch – you can RESTRICT this possibility in their roles.
How to handle merge conflicts
When you and your colleagues made changes to the same lines of code, and both push your changes, a merge conflict will occur. Git does not know which changes are “the best”. Merge conflicts often cause headaches, so in short the best way to resolve them:
- Type “git status” in the command line, and look for the file that is conflicted
- Open the file in a text editor
- Look for the “<<<<<<<” in the file
- The two conflicting codeparts are separated by “ =======”, remove this and the option you do not want to keep
- Remove the lines with “<<<<<<< HEAD” and “ >>>>>>> ‘YOURBRANCHNAME’“
Then you can proceed normally with a merge request or a push (step 4 above).
Drop your own changes and pull new changes
Sometimes a colleague has made changes on the same piece of code that you were working on, and you want to “drop” the changes you made to prevent merge conflicts. The easiest way to do this is to use the command “git stash”. This command stores all changes you made and restores your code to the last pulled version. Now you can pull the changes without conflicts! And rest assured, if you want your changes back this is still possible. For more details about stashing look here.
I hope this article has made you enthusiastic on the use of Git for your projects. In any case after reading this article your introduction to Git is likely to be less disastrous as my first attempt.
After reading this still feel like you would like some guidance when it comes to start using Git? Just reach out to us on info@theanalyticslab.nl and we can tell you all about the Git training we give.
Curious on one of the projects we collaborated on using Git, check out our very own Python package on Gitlab (available through a pip install): tortoise.