Home > Mobile >  compare two filenames and extract differences
compare two filenames and extract differences

Time:03-12

I have two files with almost identical filenames:

/home/104800-001-001/H27VNDSX3_104800-001-001_GCCTATCA-CGACCATT_L002_R1.extracted.fastq.gz
/home/104800-001-001/H27VNDSX3_104800-001-001_GCCTATCA-CGACCATT_L002_R3.extracted.fastq.gz

How can I extract in bash ONLY the different characters?

Desired output:

1 3

Edit:

  • Always the same length
  • Take into account only differences in _R[0-9]

CodePudding user response:

Comparing Only An Interesting Subset

(Answering the question as-edited)

#!/usr/bin/env bash

s1='/home/104800-001-001/H27VNDSX3_104800-001-001_GCCTATCA-CGACCATT_L002_R1.extracted.fastq.gz'
s2='/home/104800-001-001/H27VNDSX3_104800-001-001_GCCTATCA-CGACCATT_L002_R3.extracted.fastq.gz'

revision_re='_R([[:digit:]] )[._]'

rev1=; rev2=;
[[ $s1 =~ $revision_re ]] && rev1=${BASH_REMATCH[1]}
[[ $s2 =~ $revision_re ]] && rev2=${BASH_REMATCH[1]}

if [[ $rev1 && $rev2 ]] && [[ $rev1 != "$rev2" ]]; then
  printf '%s %s\n' "$rev1" "$rev2"
fi

Comparing The Whole String

(Answering the question as originally asked)

#!/usr/bin/env bash

s1='/home/104800-001-001/H27VNDSX3_104800-001-001_GCCTATCA-CGACCATT_L002_R1.extracted.fastq.gz'
s2='/home/104800-001-001/H27VNDSX3_104800-001-001_GCCTATCA-CGACCATT_L002_R3.extracted.fastq.gz'
 
max_len=$(( ${#s1} > ${#s2} ? ${#s1} : ${#s2} ))
for (( idx=0; idx<max_len; idx   )); do
  if [[ ${s1:idx:1} != "${s2:idx:1}" ]]; then
    printf '%s ' "${s1:idx:1}" "${s2:idx:1}"
  fi
done
printf '\n'
  • Related