Find files that would be changed by git cherry-CodePudding

According to its man page, git cherry does some testing to determine if a commit should be cherry picked into another branch:

The equivalence test is based on the diff, after removing whitespace
and line numbers. git-cherry therefore detects when commits have been
"copied" by means of git-cherry-pick(1), git-am(1) or git-rebase(1).

I want create a script to further minimize the list of cherry-pick candidates by removing all commits that would change only a certain file.

E.g. if cherry picking commit 1 with changed files A, B, C into my branch would change only file A while B and C would already contain the changes, I want the script to remove the commit from the list of candidates.

Is there an easy way to get this information out of Git?

CodePudding user response：

further minimize the list of cherry-pick candidates by removing all commits that would change only a certain file

You can use git diff-tree -p $commit | git apply --exclude=path/to/that/file --numstat, if that lists any changes, the commit has changes in other files, but it's not clear what "would change" means here. "Would change", if you cherry-picked it again regardless of whether its changes outside that file have already been applied?

The only way to do that is to do a test run of the actual apply. You can automate that check, but you're leaving a lot of questions open here. git diff-tree -p $commit | git apply --exclude=path/to/that/file -3, then git diff --name-only to see if there's any changes pending, then git reset --hard before doing or not doing the whole cherry-pick.

But cherry-picking a commit could make changes to the current upstream tip regardless of whether it's already been cherry-picked, if subsequent work reverted it or amended it. So if you don't care whether you're re-applying subsequently reverted changes, why are you starting from the git cherry list at all? Something isn't making sense here.

CodePudding user response：

(depending on how you interpret "easy" ...)

git has a patch-id command command, that builds a hash from a diff, with some rules (ignoring line numbers and whitespaces) to try to have the property : two patches that introduce the same changes will have the same patch-id.

You can use this in the following way :

for each commit :

choose a way to generate the patch it produces on files other than that/file
run that patch through | git patch-id --stable
once you have computed all patch-ids in your range of commits, you can compare them to prune commits from your list