Git subtree for tracking changes in upstream apps
Classification:
Git repo setup for tracking upstream changes
Notes
- GitHub recently changed the default name of the default branch from
mastertomain; be careful to not fall for that - some git knowledge is required below
- if the upstream repo is a repo that has a single chart only, you can skip the 2nd repo described below (the “upstream copy”)
Video tutorials
You can watch them here:
Usages
The following apps - not limited to - are managed with this method:
Assumptions
As an example, I’m
going to use https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed
which is one of the charts held in upstream helm-charts repo. I want to easily track it,
but also be able to submit patches to upstream. I also want to make some specific changes, that
I never want to go to upstream.
Repository types and setup
We will be working with 3 git repositories per project. This covers the most complex scenario, where we track upstream repo that hosts multiple charts in a single repository, and we want to track all of them, but finally get just a single chart from there.
Please note, that for a simpler scenario, where you just want to track a subdirectory of an upstream repository and in general repository layout is very similar to what you have or when you don’t plan to submit patches to upstream, you can successfully use this method with just 2 repositories, “upstream” and “chart repo”.
The three repos we’re going to use are:
- Original upstream repo https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed. Every time I say “upstream” repo, I mean this one. It is read-only for us and maintained by external organization.
- (Optional, but recommended) Our copy for tracking the “upstream”. We will fork the upstream to create what we call “upstream copy” repo. This repo will be used for easily contributing changes into the upstream repo. In this repo we prepare patches we hope can be accepted by upstream. Our workflow allows us to submit a patch to “upstream” and either use the patch before it is accepted or wait until it is accepted upstream and shows in one of the branches of “upstream”. If this repo is a multi-chart repo, where all the charts share some code, it is also a perfect place to apply patches to the shared code, whether we want to submit them to upstream or not.
- Chart repo. The “chart repo” is the one where we keep a single chart we want to build for the app platform. Here we apply all the changes and patches that we need to make it work, but also don’t really want to send to “upstream” (so, any Giant Swarm specific stuff).
Setting up repos
Everything here will be shown as an example based on the grafana helm charts repo. Please make sure to go there and have a look at how the repo is organized before you read on.
Upstream copy
Let’s start with creating “upstream copy”. Go on GitHub to the “upstream” repo and fork it.
Make sure to change
the default repo name into something meaningful and ending with “-upstream”. In my example, the
default repo name was giantswarm/helm-charts, but I changed it to
grafana-helm-charts-upstream.
Now, clone the “upstream copy” repo to your machine:
git clone git@github.com:giantswarm/grafana-helm-charts-upstream.git
cd grafana-helm-charts-upstreamWe will use the “upstream copy” repo in the following way:
main(ormaster) - this is the branch we will use in “chart repo”. It is tracking “upstream”, but also will include all our patches to “upstream” that we hope to be accepted by upstream project some day, but need to use right away.upstream-main- to directly trackmain/masterbranch from “upstream”. This branch is read-only for us; we only use it to synchronize with “upstream”. We set it up by adding a new remote and setting merge config for it:git checkout -b "upstream-main" git remote add -f upstream https://github.com/grafana/helm-charts.git git branch -u upstream/main git push origin upstream-main # now we can pull changes from "upstream"/main and merge to our "upstream copy"/upstream-mainor using the new Git syntax:
git remote add -f upstream https://github.com/grafana/helm-charts.git git switch -c upstream-main upstream/main # creates branch and sets tracking git push origin upstream-main
Chart repo
Now, it’s time to create our “chart repo” and reference the code we have in “upstream copy”.
Go to github and create a new repo using the ginatswarm-template-app template. I’ve created
https://github.com/giantswarm/loki-app.
Clone this repo to your local machine and setup “upstream copy” as remote to track:
git clone git@github.com:giantswarm/loki-app.git
cd loki-app
git rm -r helm/APP-NAME-app && git commit -am "remove template chart" && git push # optionally remove the chart template
git remote add -f --no-tags upstream-copy git@github.com:giantswarm/grafana-helm-charts-upstream.git # add remotePlease do not pass the --no-tags flag. If you add it, no tags from remote repo will be added to yours.
This means you won’t “pollute” your local repo, which is a good thing, but also it will make impossible
to check what upstream means by specific tag. This might be useful, especially when you’re migrating
to the subtree workflow. The decision to include tags is yours, but as a rule of thumb it’s better to
not include them unless you know they are useful for you.
Now, we add code from “upstream-copy” as subtree. We have 2 options here:
We add the whole “upstream-copy” repo, as it is in
mainbranch, as a subdirectory in the current repogit subtree add --prefix helm/loki-app upstream-copy main --squash git pushThat’s it, now your
helm/loki-apphas the same content as is present in themainbranch ofupstream-copyremote. The--squashoption squashes all the incoming commits into one big commit, which is a good thing, as otherwise you’ll put all the commits from upstream into your local repo and make the history really noisy.More complex scenario: we want to add only the
charts/loki-distributedsubdirectory from themainbranch ofupstream-copy. To do that, we first need to create a branch where we create the subtree, then we use thesubtree splitcommand to go over all commits and split only these that altered files in this directory into a temporary branchtemp-split-branch. Then we add this branch as a subtree:# Create a work branch git checkout -b tmp # Add remote in a subdir # git subtree add --prefix [target directory] [git remote] [remote branch] --squash git subtree add --prefix upstream-tmp upstream-copy main --squash # create a new branch with only the contents of a path # git subtree split -P [path] -b [target branch] git subtree split -P charts/helm-distributed -b temp-split-branch # create a branch where you will actually update the remote git checkout master git checkout -b updates # Put the extracted path in [path] in your new branch # git subtree add --squash -P [path] [source branch] git subtree add --squash -P helm/loki-distributed temp-split-branchImportant: here we use
mainfromupstream-copyas the state we want Most probably it makes more sense for you to use some other state of theupstream-copy, like avX.Y.Ztag, which means a stable release of the chart. Here we’re tracking the cutting edge inmainrepo.Tip: When you have add the subtree to the repository, it might be worth considering to add a git note. Since the subtree commands adds a commit, it’s not directly clear where it came from. This is even trickier to track when using tags rather than from main/master. To add a note, straight after the
git subtree add ...command, run the following:
git notes add -m "upstreamSync: https://github.com/someUser/SomeRepo/tree/v1.0.0"The notes are freeform, but adhering to a structure can help in the future (for example if we add automation related to the notes).
Pay attention
git subtree works by adding formatted comments in commit messages.
They include git-subtree-dir, git-subtree-mainline and git-subtree-split parts and look like this:
9c339fb [promtail] Add Network Policies to promtail
git-subtree-dir: helm/promtail
git-subtree-split: 73421663a95a43466e2e3e434caf3a50ff56227e
To be more specific, git subtree module greps previous commit messages
looking for example for git-subtree-dir using this command:
git log --grep="^git-subtree-dir: $dir/*\$"As a result, you must work carefully to never delete such a commit messages, as then git subtree will
lose any track of your previous subtree command. Pay special attention when you merge a branch
that includes subtree work, as in this case you often edit a long set of messages to something shorter,
so it’s easy to remove git subtree comments.
Example of how it looks like in GitHub:
.
Workflows
I want to set up my local repos after they were already created for the first time
to setup “upstream-copy”:
git clone git@github.com:giantswarm/grafana-helm-charts-upstream.git cd grafana-helm-charts-upstream git checkout upstream-main git remote add -f upstream https://github.com/grafana/helm-charts.gitto set up “chart repo”
git clone git@github.com:giantswarm/loki-app.git cd loki-app git remote add -f --no-tags upstream-copy git@github.com:giantswarm/grafana-helm-charts-upstream.git # add remote
I want to update to the latest version from upstream
Assuming you want to get to the state of main branch in upstream-main. If you want any other state,
replace upstream/main with any other branch or just tag: vX.Y.Z (to see upstream tags, you need to
skip the --skip-tags flag, as explained above in set up instructions).
- In “upstream-copy” repo
- make sure your local “main” branch is up-to-date with origin “main”
- checkout “upstream-main” branch
- fetch changes from “upstream/main”, merge them with “upstream-main”
- checkout “main”, merge “upstream-main” to it, push “master”
- example commands:
git checkout main
git pull origin main
git checkout upstream-main
git fetch upstream
git merge upstream/main
git push origin upstream-main # push upstream changes to GitHub
git push origin [latest-tag-from-upstream]
git checkout main
git merge upstream-main
git push origin main
git push origin [latest-tag-from-upstream]- In “chart repo”
- if the subtree is tracking the whole “upstream copy” repo
git fetch upstream-copy main
git subtree pull --prefix helm/loki-app upstream-copy main --squash- if the subtree is tracking a subdir of “upstream copy”:
# Fetch the upstream tags as `upstream-`
git fetch upstream-copy refs/tags/:refs/tags/upstream-
# It's OK to be in detached head, we won't change anything
git checkout upstream-
git subtree split -P charts/loki-distributed -b temp-split-branch
git checkout master
git subtree merge --squash -P helm/loki temp-split-branch
git push
# Clean up temporary split branch and upstream tag
git branch -D temp-split-branch
git tag -d upstream- I want to send non-urgent patch for upstream
Do this if you want to submit a patch for “upstream”, but you also want to wait it until it is accepted by upstream (so, you’ll get your patch applied and then get it from “upstream” someday):
- go to “upstream copy”, update remote “upstream” and fetch changes into the “upstream-main” branch
- create a branch “my-feature” from “upstream-main”
- when ready, create a PR for “upstream”
- when PR is merged, remove local “my-feature” branch and update our dependencies as in normal upstream update
I want to send urgent patch for upstream and use it already
Do this if you want to submit a patch for “upstream” and you need to use it right away, without waiting for being accepted by upstream:
Pay attention
GitHub automatically closes the Pull Request in the upstream repository once you merge the subtree PR in the app’s repository. You can simply re-open it.- go to “upstream copy” repository, update remote “upstream” and fetch changes into the “upstream-main” branch (Step 1 of I want to update to the latest version from upstream
- create a branch “my-feature” from “upstream-main”
- when ready, create a PR (PR1) for “upstream”
- create another PR (PR2) to merge “my-feature” into “main”
- when PR2 is merged, update “chart repo” dependency on “upstream-copy/master” as in point 2 in normal upstream update
- when PR1 is merged, remove local “my-feature” branch and update our dependencies as in normal upstream update
I want to make changes that I don’t want to be ever sent to upstream
Do this if you want to make any Giant Swarm specific changes to the chart. We have two options about where to do that, and it’s up to you to think where it makes the most sense.
- In the “chart repo” - this should be your default.
- just do it - you can commit and change anything you want in the “subtree” catalog and your changes won’t be lost when you update it.
- In the “upstream copy” repo - makes sense for cases where multiple charts include some shared sub-chart, and you want to patch it.
- go to “upstream copy”, checkout and update “main” branch
- create a branch “my-feature” from “main”
- when ready, create a PR from “my-feature” to “main”
- when PR is merged, update “chart repo” dependency on “upstream-copy/master” as in point 2 in normal upstream update
I want to switch from another way of tracking upstream to the git-subtree way
In general, we have two options here:
Git-supported. It works like this: we start by figuring our a commit (tag, branch, anything) in our current repo that was an exact copy of a known upstream version. Let’s say this is represented by the
vX.Y.Ztag. Now, I can save a diff between that clean state (a state of my repo when I got it from the “upstream” but before I applied any custom changes) and my current most recent state. The result should include everything we’ve changed sincevX.Y.Zcomparing to upstream. Now we remove the code from our repo, then include it again in the exactly samevX.Y.Zversion, but this time using thegit-subtreecommand. Then, we apply our patch file on the subtree and commit it. From now on, we can do any update as described above.Example: I want to switch my
grafana-apprepo to use git subtree from “upstream copy”. I know that my repo in the commit1111111has the same code as the “upstream” repo had in thegrafana-6.1.3tag. We also want to do the migration in separate branchswitch-to-subtreeto be able to create a valid PR for the change and not work on themasterdirectly.git checkout -b switch-to-subtree git diff 1111111 -- helm/grafana-app > chart.diff git remove -r helm/grafana-app git commit -am "chart code deleted" git checkout grafana-6.1.3 git subtree split -P charts/grafana -b temp-split-branch git checkout switch-to-subtree git subtree add --squash -P helm/grafana-app temp-split-branch # switched to subtree git apply chart.diff # applied any custom changes git commit -am "applied custom changes" # you're ready continue with updating to the current stateManual way. We copy our current state somewhere (outside the current git tree), then we remove all the code we want to get from “upstream” or “upstream-copy”. We add the code back using
git subtree. Then we manually go over our backup copy and apply any chnages needed by editing the code. Then we commit the changes. From now on, we can do any update as described above.Example:
git checkout -b switch-to-subtree cp -a helm/grafana-app /tmp git remove -r helm/grafana-app git commit -am "chart code deleted" git checkout grafana-7.1.0 # a version I want to update to git subtree split -P charts/grafana -b temp-split-branch git checkout switch-to-subtree git subtree add --squash -P helm/grafana-app temp-split-branch # switched to subtree # the hard part: compare what you have in /tmp/grafana-app with the helm/grafana-app and apply # all the missing changes git commit -am "applied custom changes" # you're ready continue with updating to the current state
Fixing when original subtree-split commit is lost
For reasons we’re not completely sure of (but that for sure include history rewrite with git push --force), it might happen that the commit ID that is merged as a subtree from the artificial sub-directory tree created with git subtree split might change its commit ID, even though the set of changes is still the same. One way you can get out of this situation is:
- Figure out (from normal git log) in the
mainbranch (or the working branch) when we did the last actual merge from upstream. This merge commit has the following message in it:
Squashed 'your/subdir' content from commit [short ID]
git-subtree-dir: your/subdir
git-subtree-split: [long ID]Let’s assume in our case the commit was done on 03.01.2023.
Do normal git subtree split, then checkout this tree so you can inspect its log. Start with checkout
git checkout temp-split-branchthen usegit log. Now find the last artificial commit ID that we merged - its hash is different now, but we assume the set of patches is the same. In our case we look for the newest commit older than 03.01.2023 in the artificial history created bysubtree split. Let’s assume this commit has ID5a9c69f11f1466569e04c8a60cbb132617d2185f.git-subtreeis just a bash script that greps git log in search for that hash, so we can make it believe that certain hash was merged by doing an empty commit withgit commit --allow-empty -mand including a commit message informing that it was merged:
Squashed 'your/subdir' content from commit 5a9c69f1
git-subtree-dir: your/subdir
git-subtree-split: 5a9c69f11f1466569e04c8a60cbb132617d2185fTo make it easier to find the needed hash ID in the artificial history created by subtree, you can try the following script
last-subtree = "!f() { git log temp-split-branch --until \"$(git log main --grep='git-subtree-split' --pretty=format:\"%ad\" -1)\" --pretty=format:\"%H\" -1; }; f"