Home > Blockchain >  How to find a directory containing a file with one extension and not containing a file with another
How to find a directory containing a file with one extension and not containing a file with another

Time:02-01

I have to filter out a huge number directories/paths, that contain files with the extension .spring (a compressed file of two fastq files). But in some of these directories, .fastq.gz files are still present. I need to get the path of those directories where there is only .spring files and not .fastq.gz files. (Specifically I need to get the path of those .spring files).

I have tried using the find command, but somehow it is not working as I intended. Please give some suggestions

Also how do I check if the directory contains both the files - .fastq.gz and .spring?

Thanks

I tried the following:

find $(find $PWD -name "*.spring" -printf '%h\\n') -not -name  "*.fastq.gz"

CodePudding user response:

Here is a bash script that returns the desired list:

#!/bin/bash
set -euo pipefail
IFS=$'\n\t'

# find all directories containing *.spring files
spring_dirs=$(find $PWD -type f -name '*.spring' -exec dirname {} \; | sort -u)

# within the directories containing *.spring files,
# find those directories that also contain *.fastq.gz files,
# but use -maxdepth 1 to not look any deeper than the *.spring file dir
fastq_dirs=$(find $spring_dirs -maxdepth 1 -type f -name '*.fastq.gz' -exec dirname {} \; | sort -u)

# concatenate the two sets of directories and only keep the ones
# that are not repeated
spring_only_dirs=$(printf "%s\n%s\n" "$spring_dirs" "$fastq_dirs" | sort | uniq -u)

# use the directories of *.spring files to get the full
# file names of the *.spring files
find $spring_only_dirs -maxdepth 1 -type f -name '*.spring'

This is not the fastest approach, but hopefully easy to understand and fairly short.

CodePudding user response:

This is my approach in a function form. It will take 2 arguments that will be the extensions to search for. $1 will be the files that should exist, and $2 the ones that should not exist on folder. In this case they should be $1=spring and $2=fastq.gz.

findExcludingIncludingDirs () {
    local Dirs=( `find . -type d` )
    local Dir typesA typesB
    
    for Dir in $Dirs; do
        if [[ $Dir = '.' ]]; then continue; fi

        typesA=( `find ${Dir} -maxdepth 1 -name "*.${1}" -type f 2>/dev/null` )
        typesB=( `find ${Dir} -maxdepth 1 -name "*.${2}" -type f 2>/dev/null` )
        
        if [[ ${#typesA[@]} -gt 0 ]]; then
            if [[ ${#typesB[@]} -eq 0 ]]; then
                echo "${Dir} contains only ${1} Files"
            else
                echo "${Dir} contains both ${1} & ${2} Files"
            fi
        fi
    done;
}; alias feid="findExcludingIncludingDirs"
  • Dirs will hold all directories
  • types[A|B] will hold the files with queued extensions in current loop folder (non recursively due to maxdepth). This lines will also output the errors of folders with no concordant files to stderr via 2>/dev/null

Just call it from the path from which you want to search

The alias feid is just for simplifying writing command My approach it's been only tested under OSx zsh env

Hope it helps, Regards!

  • Related