Home > Back-end >  How can I list (and find the number of files) of directories with a bash script?
How can I list (and find the number of files) of directories with a bash script?

Time:03-18

I have a structure of directories and subdirectories that in the end point contain some files of some extension (say jpg files) The structure of the directories is not set. So it can be something like

top_directory
|__child1
|   |__one
|   |_two
|
|__child2
|   |_three
|
|__child3 
   |_child3_1
      |__four
      |__five
      |__six

How can make a script that counts the number of files of said extension in the sub directories where there exist.

In the past where there were only one level of subdirectories I did something like

for entry in ./*/
do
echo "$entry"
ls "$entry"/*.jpg -l | wc -l
done

this iterated with entry through all subdirectories and counted the files . However this obviously does not work when there are sub sud directories.

CodePudding user response:

Here's a not particularly clever way of doing it (that does effectively what you're does but recursively AND doesn't solve that the file names don't mean they are JPG) -

( find . -type d -print | while read line; do echo "$line" $( ls -1 "$line"/*.jpg 2>/dev/null | wc -l); done ) | grep -v ' 0$'

Something quite similar to your request has been answered in details at unix & linux SO

CodePudding user response:

Using GNU find for -printf.

find /top/dir -type f -name '*.jpg' -printf . | wc -c

Unlike ls (which generally you should not use in scripts), it works even if a filename contains a newline.

edit: Count files per sub-directory (asked in comment):

There's a few ways to do it, but maybe like this. It's good for interactive output (ie. to display to a user). You will see each subdirectory and its count. Except, dirs containing zero .jpg files will not be listed (either a pro or a con, depending on use case).

find /top/dir -type f -name '*.jpg' -exec dirname -z -- {}   |
sort -z |
uniq -zc |
sort -znk 1,1 |
tr '\0' '\n'

This requires GNU tools for the null delimiters (-z flags). The second sort sorts counts, low to high. Add -r (reverse) for high to low.

CodePudding user response:

@dan has a good approach, but a similar approach making use of a helper-script to count the files in each directory found is another simple and reasonably efficient way to do this. With your find command you will recursively find the subdirectories below a given directory. You retrieve the directory names with:

find /top/dir -type d -print -exec ./helperf '{}' jpg \;

the -print above is optional and simply outputs the current directory name before the helper script (helperf) outputs the number of files in that directory. jpg (or any file extension) is likewise optional and if omitted, all files in a given directory are counted. Since you invoke your helper script with -exec you should make it executable (or include a full bash invocation for it)

The helper function, helperf simply calls find similar to how @dan proposes, but limits the -maxdepth to 1 so only files in that directory are counted. Your helper script could be:

#!/bin/bash

[ -d "$1" ] && {                                ## first param is directory
    if [ -n "$2" ]; then                        ## ext given as second param
        find "$1" -maxdepth 1 -type f -name "*.$2" -printf . 2>/dev/null | wc -c
    else                                        ## no ext given, count all files
        find "$1" -maxdepth 1 -type f -printf . 2>/dev/null | wc -c
    fi
}

Above:

  • [ -d "$1" ] serves as a simple validation ensuring the argument passed is a valid directory. If not, the script silently exits.
  • if [ -n "$2" ]; then check if a second extension argument was given and if so the find on files is limited to files ending in that extension. Without it, all files in the directory are counted.

Example Use/Output

Given my tmp directory on this box has the structure:

tree -d
.
├── awk
├── clamav
│   └── src
└── st

Getting a check of all files results in:

$ find . -type d -print -exec ./helperf '{}' \;
.
40
./clamav
5
./clamav/src
0
./awk
2
./st
3

Which are the correct number of total files in the directory.

Now limiting to just .txt files (of which there are 6 in the parent directory only), you would have:

$ find . -type d -print -exec ./helperf '{}' txt \;
.
6
./clamav
0
./clamav/src
0
./awk
0
./st
0

This seems to be close to what you are looking for. Look it over and let me know if you have further questions.

  • Related