Home > OS >  Script to find recursively the number of files with a certain extension
Script to find recursively the number of files with a certain extension

Time:11-17

We have a highly nested directory structure, where we have a directory, let's call it 'my Dir', appearing many times in our hierarchy. I am interested in counting the number of "*.csv" files in all directories named 'my Dir' (yes, there is a whitespace in the name). How can I go about it?

I tried something like this, but it does not work:
find . -type d -name "my Dir" -exec ls "{}/*.csv" \; | wc -l

CodePudding user response:

If you want to the number of files matching the pattern '*.csv' under "my Dir", then:

  • don't ask for -type d; ask for -type f
  • don't ask for -name "my Dir" if you really want -name '*.csv'
  • don't try to ls *.csv on each match, because if there's more N csv files in a directory, you would potentially count each one N times
  • also beware of embedding {} in -exec code!

For counting files from find, I like to use a trick I learned from Stéphane Chazelas on U&L; for example, from: Counting files in Linux:

find "my Dir" -type f -name '*.csv' -printf . | wc -c

This requires GNU find, as -printf is a GNU extension to the POSIX standard.

It works by looking within "my Dir" (from the current working directory) for files that match the pattern; for each matching file, it prints a single dot (period); that's all piped to wc who counts the number of characters (periods) that find produced -- the number of matching files.

CodePudding user response:

You would exclude all pathcs that are not My Dir:

find . -type f -not '(' -not -path '*/my Dir/*' -prune ')' -name '*.csv'

CodePudding user response:

Another solution is to use the -path predicate to select your files.

find . -path '*/my Dir/*.csv'

Counting the number of occurrences could be a simple matter of piping to wc -l, though this will obviously produce the wrong result if some of the files contain newlines in their names. (This is slightly pathological, but definitely something you want to cover in production code.) A common arrangement is to just print a newline for every found file, instead of its name.

find . -path '*/my Dir/*.csv' -printf '.\n' | wc -l

(The -printf predicate is not in POSIX but it's not hard to replace with an -exec or similar.)

  • Related