In a git repo, I want to list directories (and sub-directories) that contain tracked items and the number items (tracked files only) in each of them.
The following command gives list of directories:
$ git ls-files | xargs -n 1 dirname | uniq
, and this one counts all tracked items in the repository:
$ git ls-files | wc -l
The following command counts files in all sub-directories:
$ find . -type d -exec sh -c "echo '{}'; ls -1 '{}' | wc -l" \; | xargs -n 2 | awk '{print $1" "$2}'
But it also counts the directories themselves and, of course, it does not care if files are tracked. Take a look at the example below for more explanation:
C:\ROOT
│ tracked1.txt
│
├───Dir1
│ ├───Dir11
│ │ tracked111.txt
│ │ tracked112.txt
│ │
│ └───Dir12
│ ignored121.tmp
│ tracked121.txt
│
└───Dir2
│ ignored21.tmp
│ Tracked21.txt
│
└───Dir21
ignored211.tmp
ignored212.tmp
Running $ find root -type d -exec sh -c "echo '{}'; ls -1 '{}' | wc -l" \; | xargs -n 2 | awk '{print $2", "$1}'
command gives the following result:
3, root
2, root/Dir1
2, root/Dir1/Dir11
2, root/Dir1/Dir12
3, root/Dir2
2, root/Dir2/Dir21
What I need is:
31, root
2, root/Dir1
2, root/Dir1/Dir11
21, root/Dir1/Dir12
31, root/Dir2
2, root/Dir2/Dir21
, where sub-directories and ignored items are not counted, and directories with no tracked items are not included. But I don't know how to pipe these commands to get the results.
CodePudding user response:
git ls-files | awk '{$NF="";print}' FS=/ OFS=/ | sort | uniq -c
or, shorter,
git ls-files | sed 's,[^/]*$,,' | sort | uniq -c
CodePudding user response:
The following code lists all files, groups them by their directory names and prints the size of each group:
$ git ls-files | xargs -n 1 dirname | awk ' { filescount[$1] = 1 }
END {
n=asorti(filescount, sortedpath);
for (i = 1; i <= n; i ) print filescount[sortedpath[i]], sortedpath[i]
}'
1 .
1 Root
2 Root/Dir1/Dir11
1 Root/Dir1/Dir12
1 Root/Dir2
If you also need the total number of lines of code in each directory:
$ git ls-files | xargs -n1 wc -l | awk ' { sub("/[^/]*$", "/") } 1' |
awk ' { filescount[$2] = 1; linescount[$2] = $1 }
END {
n=asorti(filescount, sortedpath);
for (i = 1; i <= n; i )
print filescount[sortedpath[i]], linescount[sortedpath[i]], sortedpath[i]
}'
1 260 .gitignore
1 5 Root/
2 1 Root/Dir1/Dir11/
1 2 Root/Dir1/Dir12/
1 4 Root/Dir2/
The second command does not group files of the root directory and adds a separate line for each of them. The problem is in awk ' { sub("/[^/]*$", "/") } 1'
part that tries to extract directories from a path. It fails and returns the whole path when there is no parent directory in the path (e.g. .gitignore
).