Home > Blockchain >  How to count number of tracked files in each sub-directory of the repository?
How to count number of tracked files in each sub-directory of the repository?

Time:04-02

In a git repo, I want to list directories (and sub-directories) that contain tracked items and the number items (tracked files only) in each of them.

The following command gives list of directories:

$ git ls-files | xargs -n 1 dirname | uniq

, and this one counts all tracked items in the repository:

$ git ls-files | wc -l

The following command counts files in all sub-directories:

$ find . -type d -exec sh -c "echo '{}'; ls -1 '{}' | wc -l" \; | xargs -n 2 | awk '{print $1" "$2}'

But it also counts the directories themselves and, of course, it does not care if files are tracked. Take a look at the example below for more explanation:

C:\ROOT
│   tracked1.txt
│
├───Dir1
│   ├───Dir11
│   │       tracked111.txt
│   │       tracked112.txt
│   │
│   └───Dir12
│           ignored121.tmp
│           tracked121.txt
│
└───Dir2
    │   ignored21.tmp
    │   Tracked21.txt
    │
    └───Dir21
            ignored211.tmp
            ignored212.tmp

Running $ find root -type d -exec sh -c "echo '{}'; ls -1 '{}' | wc -l" \; | xargs -n 2 | awk '{print $2", "$1}' command gives the following result:

3, root
2, root/Dir1
2, root/Dir1/Dir11
2, root/Dir1/Dir12
3, root/Dir2
2, root/Dir2/Dir21

What I need is:

3 1, root
2, root/Dir1
2, root/Dir1/Dir11
2 1, root/Dir1/Dir12
3 1, root/Dir2
2, root/Dir2/Dir21

, where sub-directories and ignored items are not counted, and directories with no tracked items are not included. But I don't know how to pipe these commands to get the results.

CodePudding user response:

git ls-files | awk '{$NF="";print}' FS=/ OFS=/ | sort | uniq -c

or, shorter,

git ls-files | sed 's,[^/]*$,,' | sort | uniq -c

CodePudding user response:

The following code lists all files, groups them by their directory names and prints the size of each group:

$ git ls-files | xargs -n 1 dirname | awk ' { filescount[$1]  = 1 }
     END { 
         n=asorti(filescount, sortedpath); 
         for (i = 1; i <= n; i  ) print filescount[sortedpath[i]], sortedpath[i] 
         }'
1 .
1 Root
2 Root/Dir1/Dir11
1 Root/Dir1/Dir12
1 Root/Dir2

If you also need the total number of lines of code in each directory:

$ git ls-files | xargs -n1 wc -l | awk ' { sub("/[^/]*$", "/") } 1' | 
awk ' { filescount[$2]  = 1; linescount[$2]  = $1 }
     END { 
         n=asorti(filescount, sortedpath); 
         for (i = 1; i <= n; i  ) 
            print filescount[sortedpath[i]], linescount[sortedpath[i]], sortedpath[i] 
         }'
1 260 .gitignore
1 5 Root/
2 1 Root/Dir1/Dir11/
1 2 Root/Dir1/Dir12/
1 4 Root/Dir2/

The second command does not group files of the root directory and adds a separate line for each of them. The problem is in awk ' { sub("/[^/]*$", "/") } 1' part that tries to extract directories from a path. It fails and returns the whole path when there is no parent directory in the path (e.g. .gitignore).

  • Related