Squashing only results in a cleaner commit history if you're making a mess of the history on your branches. If you're structuring the commit history on your branches logically, squashing just throws information away.
I’m all ears for a better approach because squashing seems like a good way to preserve only useful information.
My history ends up being:
- add feature x
- linting
- add e2e tests
- formatting
- additional comments for feature
- fix broken test (ci caught this)
- update README for new feature
- linting
With a squash it can boil down to just “added feature x” with smaller changes inside the description.
You can always take advantage of the graph structure itself. With `--first-parent` git log just shows your integration points (top level merge commits, PR merges with `--no-ff`) like `Added feature X`. `--first-parent` applies to blame, bisect, and other commands as well. When you "need" or most want linear history you have `--first-parent` and when you need the details "inside" a previous integration you can still get to them. You can preserve all information and yet focus only on the top-level information by default.
It's just too bad not enough graphical UIs default to `--first-parent` and a drill-down like approach over cluttered "subway graphs".
If my change is small enough that it can be treated as one logical unit, that will be reviewed, merged and (hopefully not) reverted as one unit, all these followup commits will be amends into the original commit. There's nothing wrong with small changes containing just one commit; even if the work wasn't written or committed at one time.
Where logical commits (also called atomic commits) really shine is when you're making multiple logically distinct changes that depend on each other. E.g. "convert subsystem A to use api Y instead of deprecated api X", "remove now-unused api X", "implement feature B in api Y", "expose feature B in subsystem A". Now they can be reviewed independently, and if feature B turns out to need more work, the first commits can be merged independently (or if that's discovered after it's already merged, the last commits can be reverted independently).
If after creating (or pushing) this sequence of commits, I need to fix linting/formatting/CI, I'll put the fixes in a fixup commit for the appropriate and meld them using a rebase. Takes about 30s to do manually, and can be automated using tools like git-absorb. However, in reality I don't need to do this often: the breakdown of bigger tasks into logical chunks is something I already do, as it helps me to stay focused, and I add tests and run linting/formatting/etc before I commit.
And yes, more or less the same result can be achieved by creating multiple MRs and using squashing; but usually that's a much worse experience.
stacked diffs are the best approach and working at a company that uses them and reading about the "pull request" workflow that everyone else subjects themselves to makes me wonder why everyone is not using stacked diffs instead of repeating this "squash vs. not squash" debate eternally.
every commit is reviewed individually. every commit must have a meaningful message, no "wip fix whatever" nonsense. every commit must pass CI. every commit is pushed to master in order.
Not everyone develops and commits the same way and mandating squashing is a much simpler management task than training up everyone to commit in a similar manner.
Besides, they probably shouldn't make PR commits atomic, but do so as often as needed. It's a good way to avoid losing work. This is in tension with leaving behind clean commits, and squashing resolves it.
The solution there is to make your commit history clean by rebasing it. I often end my day with a “partial changes done” commit and then the next day I’ll rebase it into several commits, or merge some of the changes into earlier commits.
Even if we squash it into main later, it’s helpful for reviewing.
At work there was only one way to test a feature, and that was to deploy it to our dev environment. The only way to deploy to dev was to check the repo into a branch, and deploy from that branch.
So one branch had 40x "Deploy to Dev" commits. And those got merged straight into the repo.
Good luck getting 100+ devs to all use the same logical commit style. And if tests fail in CI you get the inevitable "fix tests" commit in the branch, which now spams your main branch more than the meaningful changes. You could rebase the history by hand, but what's the point? You'd have to force push anyway. Squashing is the only practical method of clean history for large orgs.
Also rebasing is just so fraught with potential errors - every month or two, the devs who were rebasing would screw up some feature branch that they had work on they needed and would look to me to fix it for some reason. Such a time sink for so little benefit.
I eventually banned rebasing, force pushes, and mandated squash merges to main - and we magically stopped having any of these problems.
We squash, but still rebase. For us, this works quite well. As you said, rebasing needs to be done carefully... But the main history does look nice this way.
True but. There's a huge trade-off in time management.
I can spend hours OCDing over my git branch commit history.
-or-
I can spend those hours getting actual work done and squash at the end to clean up the disaster of commits I made along the way so I could easily roll back when needed.
Squash loses the commit history - all you end up with is merge merge merge
It's harder to debug as well (this 3000line commit has a change causing the bug... best of luck finding it AND why it was changed that way in the first place.
I, myself, prefer that people tidy up their branches such that their commits are clear on intent, and then rebase into main, with a merge commit at the tip (meaning that you can see the commits AND where the PR began/ended.
"squash results in a cleaner commit history" Isn't the commit history supposed to be the history of actual commits? I have never understood why people put so much effort into falsifying git commit histories.
Here is how I think of it. When I am actively developing a feature I commit a lot. I like the granularity at that stage and typically it is for an audience of 1 (me). I push these commits up in my feature branch as a sort of backup. At this stage it is really just whatever works for your process.
When I am ready to make my PR I delete my remote feature branch and then squash the commits. I can use all my granular commit comments to write a nice verbose comment for that squashed commit. Rarely I will have more than one commit if a user story was bigger than it should be. Usually this happens when more necessary work is discovered. At this stage each larger squashed commit is a fully complete change.
The audience for these commits is everyone who comes after me to look at this code. They aren’t interested in seeing it took me 10 commits to fix a test that only fails in a GitHub action runner. They want the final change with a descriptive commit description. Also if they need to port this change to an earlier release as a hotfix they know there is a single commit to cherry pick to bring in that change. They don’t need to go through that dev commit history to track it all down.
There are several valid reasons to "falsify" commit history.
- You need to remove trash commits that appear when you need to rerun CI.
- You need to remove commits with that extra change you forgot.
- You want to perform any other kind of rebase to clean up messages.
I assume in this thread some people mean squashing from the perspective of a system like Gitlab where it's done automatically, but for me squashing can mean simply running an interactive (or fixup) and leaving only important commits that provide meaningful information to the target branch.
“Falsifying” is complete hyperbole.
Git commit history is a tool and not everyone derives the same ROI from the effort of preserving it. Also squashing is pretty effortless.