I am trying to check the type of a given file and if it is what I expect. It can have one of three extensions .fa
, .fasta
or .fasta.gz
. Looking at other questions I think this should be quite trivial however when I try suggestions they do not work for me.
This is what I have tried, all of which do not match:
#!/bin/bash
test1="abcdef.fa"
test2="ghijkl.fasta"
test3="mnopqr.fasta.gz"
echo "test1: $test1"
echo "test2: $test2"
echo "test3: $test3"
# Attempt 1
if [[ $test1 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test1\n"; fi
if [[ $test2 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test2\n"; fi
if [[ $test3 =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt1: Match with $test3\n"; fi
# Attempt 2 - do I need to quote the string?
if [[ "$test1" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test1\n"; fi
if [[ "$test2" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test2\n"; fi
if [[ "$test3" =~ *.fa|*.fasta|*.fasta.gz ]] &> /dev/null; then printf "Attempt2: Match with $test3\n"; fi
# Attempt 3 - alternative regex
if [[ $test1 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt3: Match with $test3\n"; fi
# Attempt 4 - again with the quoted string
if [[ "$test1" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test1\n"; fi
if [[ "$test2" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test2\n"; fi
if [[ "$test3" =~ .\*.(fa|fasta|fasta.gz) ]] &> /dev/null; then printf "Attempt4: Match with $test3\n"; fi
# Attempt 5 - put $ on end of regex
if [[ $test1 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt5: Match with $test3\n"; fi
# Attempt 6 - again with the quoted string
if [[ "$test1" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test1\n"; fi
if [[ "$test2" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test2\n"; fi
if [[ "$test3" =~ .\*.(fa|fasta|fasta.gz)$ ]] &> /dev/null; then printf "Attempt6: Match with $test3\n"; fi
# Attempt 7 - use double ||
if [[ $test1 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test1\n"; fi
if [[ $test2 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test2\n"; fi
if [[ $test3 =~ .\*.(fa||fasta||fasta.gz) ]] &> /dev/null; then printf "Attempt7: Match with $test3\n"; fi
I am close with this:
# Attempt 8 - escape parentheses
if [[ $test1 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test1\n"; fi
if [[ $test2 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test2\n"; fi
if [[ $test3 =~ .\*.\(fa|fasta|fasta.gz\) ]] &> /dev/null; then printf "Attempt8: Match with $test3\n"; fi
However the first test does not work and the output looks like this:
test1: abcdef.fa
test2: ghijkl.fasta
test3: mnopqr.fasta.gz
Attempt8: Match with ghijkl.fasta
Attempt8: Match with mnopqr.fasta.gz
What am I missing?
CodePudding user response:
You could try a case
statement, something like:
case "$test1" in
*.fa|*.fasta|*.fasta.gz) printf 'Attempt1: Match with %s\n' "$test1";;
esac
case "$test2" in
*.fa|*.fasta|*.fasta.gz) printf 'Attempt1: Match with %s\n' "$test2";;
esac
case "$test3" in
*.fa|*.fasta|*.fasta.gz) printf 'Attempt1: Match with %s\n' "$test3";;
esac
See
help case
See
LESS=' /case word in' man bash
CodePudding user response:
=~
is supposed to accept regex patterns and not glob patterns. Try \.(fa|fasta|fasta\.gz)$
.
Also you can use extended pattern matching: [[ $test1 == *.@(fa|fasta|fasta.gz) ]]
CodePudding user response:
It's much easier to define regex in a variable :
#!/usr/bin/env bash
test1="abcdef.fa"
test2="ghijkl.fasta"
test3="mnopqr.fasta.gz"
echo "test1: $test1"
echo "test2: $test2"
echo "test3: $test3"
pattern='\.(fa|fasta|fasta.gz)$'
# Attempt 1
if [[ $test1 =~ $pattern ]] &> /dev/null; then printf "Attempt1: Match with $test1\n"; fi
if [[ $test2 =~ $pattern ]] &> /dev/null; then printf "Attempt1: Match with $test2\n"; fi
if [[ $test3 =~ $pattern ]] &> /dev/null; then printf "Attempt1: Match with $test3\n"; fi