Some people just don't git it

tung's picture

Eric Sink recently blogged about git and how it violated "best practices" by allowing parts of a directory change-set to be committed, and history to be rewritten.

Something about the whole thing just didn't feel right. Here are the major points, as best I can summarise:

The rules and guidelines for a DVCS are different than the ones for a centralized system. [...] But some of the best practices are the same.

Okay, so Eric likes best practices, acknowledges that distributed VCSs differ from centralised ones, but there are common best practices. All fair points.

Here's my off-the-cuff sloppy definition of a "best practice":

A best practice is a guideline that can be followed lots of times by lots of different people in lots of different situations with minimal likelihood of causing pain to the team.

He gives another definition, but I'll get back to it. A best practice is a guideline. Works for lots of people in many situations. Also a fair point.

I think "git add --p" is "really cool", but it doesn't qualify as a "best practice". It allows the developer to commit code they have never seen.

git add -p (yes, just a single dash, BTW) allows you to add only parts of the changes in a file into the index, which then goes into the commit. So the index acts like a "commit scratchpad", something that doesn't exist in CVS, Subversion or Mercurial (without a plugin).

By "never seen", he means "has been compiled and tested independent of the other changes." At face value, it is yet another fair point.

But I started to get uncomfortable here. He said that altering history like this is not a best practice, but what is he really saying here?

Is there a good outcome here?

Suppose I use "git add --p" to commit some code that doesn't even compile. What can happen?

  • Maybe this changeset never escapes my private repository instance. In that case, it has caused no harm. But it has also caused no benefit.

  • Maybe my next checkin fixes the build. So now the offending changeset is less likely to cause problems, because the fix will get pushed as well. But this scenario is equivalent to the centralized case where I break the build but fix it before anybody finds out. It's not very harmful, but it's not very helpful either.

  • Maybe I later use Git's history rewriting features to eliminate the offending changeset, replacing a chain of small changesets with one larger one that has been well-tested. In this scenario, I have eliminated all the potentially harmful effects, since the DAG [of commits] will not have any nodes that are "broken". But now I have other concerns.

What he's saying here, from what I can tell, is that there's no benefit to git add -p... when the committed code doesn't work. The implication is that git add -p does more harm than good, at least in these situations. From all the talk of best practices, the implication is that all these scenarios are also common: they are some significant part of "most of the time". Therefore, git add -p in common situations can do more harm than good; not always, but most of the time.

From that perspective, it'd be better if git add -p (and it's more powerful cousin, git rebase -i) were made less visible in the docs, and described as last-resort, internal mechanisms. A power that should be used rarely, perhaps hidden and implied as such when used.

The issue of rewriting history is perhaps my biggest philosophical objection to the way Git works.

[...]

Think about it. Even if you love Git's ability to rewrite history, does this sound to you like a "best practice"? Or does it sound like a quick way to get a bunch of geeks addicted to recreational pharmaceuticals?

Fair enough, rewriting history is bad from the perspective of best practices.

Let's see if I can summarise all this without stumbling over a logical fallacy:

  1. Best practices are good advice that applies to most people, most of the time.
  2. Rewriting history does not conform to best practices.
  3. Therefore, rewriting history is not good advice for most people, most of the time.

Hmm, maybe "rewriting history" doesn't belong in "best practices". It could belong to another group of practices.

The implication here is that rewriting history in git = mostly bad. Hide it, shove it, sweep it under the rug, just warn people not to use it because it is not a "best practice".

I may be arguing with a straw-man here, but if that is the message, then I have to disagree. Rewriting history in git gives you power. A tremendous amount of power. It lets you do things that weren't even imaginable in other version control systems. For example, git bisect, a feature that enabled the Linux kernel to shorten bug-report-to-fix time from 4 days to 6 hours and an Atlassian developer to find breakage in 18 commits in under 3 minutes, would be impossible without clean, atomic commits. And clean atomic commits are only possible when you can tweak history. Or if you're really a robot in a human suit.

Call me old fashioned if you like, but I believe changesets and the history of the repository should be immutable.

"Should" = advice = best practice, according to the first definition Eric gave.

There was a constant current of "not best practice = bad" about the whole article, and that made me really uncomfortable. The whole point of "best practices" is that it's advice. That it works for most people, most of the time implies that it may not apply for some people, some of the time.

git gives you the power to rewrite history, and it trusts you to use that power responsibly. True, rewriting history is a powerful thing, and like all powerful things, it can hurt you. But without power, all that you're left with is safety scissors: can't do crap with them, and if you find yourself in a bad spot, you're screwed. Bet you wish you had that Swiss army knife now.

So "rewriting history" doesn't belong in "best practices". That's because it belongs to a far more important set: "better judgement". Better judgement, applied judiciously in a software project will have a far better outcome than mere adherence to best practices. Best practices should begin and end as advice and advice alone: better judgement will trump it every time. Better judgement accumulates from experience and a constant effort to improve, which is perhaps why new coders and experienced developers code in the same "cowboy coder" way, but get such different results, but that's a topic for another time and place.

Put your better judgement first. Learn from and experience success and (more importantly) failure. Best practices can be good advice, but that's all they are: advice.

For completeness, here's the second definition of "best practice":

Actually, I want to give TWO definitions. Here's another one, speaking as a source control vendor:

A best practice is a guideline that I can give to our customers to minimize the likelihood that they will need to call our tech support staff.

[...]

My own product supports an "Obliterate" feature and I hate it. I understand why it's there, but I still wish it wasn't.

I can't speak for Eric, but it seems like git add -p and its ilk are things he wish weren't in git. If that's so, I disagree wholeheartedly. Just because you can't find use in it doesn't mean that I can't.

One thing I've learned from twelve years of supporting version control products is that customers will find a way to misuse things.

I also can't support nerfing a utility just because the unwashed masses cannot handle it. You can add protective measures, but they'll find a way around it. The ignorant are amazing in their ability to find ingenious ways to break things.

Give the people power. Yes, they will hurt themselves. Yes, they will get burned. Yes, you'll get a lot of complaints. And yes, they'll repeat their mistakes again and again. But without power, there is no progress.

And screwing up is a small price to pay for progress.