I have the following files:
A-111.txt
A-311.txt
B-111.txt
B-311.txt
C-111.txt
C-312.txt
D-112.txt
D-311.txt
I want to merge lines of files with the same basename (same letter before the dash) if there is a match in column 4. I have many files so I want to do it in the loop.
So far I have this:
for f1 in *-1**.txt; do f2="${f1/-1/-3}"; awk -F"\t" 'BEGIN { OFS=FS } NR==FNR { a[$4,$4]=$0 ; next } ($4,$4) in a { print a[$4,$4],$0 }' "$f1" "$f2" > $f1_merged.txt; done
It works for files A and B as intended, but not for files C and D files. Can someone help me improve the code, please?
EDIT - here's the above code formatted legibly:
for f1 in *-1**.txt; do
f2="${f1/-1/-3}"
awk -F"\t" '
BEGIN {
OFS = FS
}
NR == FNR {
a[$4, $4] = $0
next
}
($4, $4) in a {
print a[$4, $4], $0
}
' "$f1" "$f2" > $f1_merged.txt
done
EDIT - after Ed Morton kindly formatted my code, the error is:
awk: cmd. line:7: fatal: cannot open file 'C-311.txt' for reading (No such file or directory)
awk: cmd. line:7: fatal: cannot open file 'D-312.txt' for reading (No such file or directory)
CodePudding user response:
Would you please try the following:
#!/bin/bash
prefix="ref_" # prefix to declare array variable names
declare -A bases # array to count files for the basename
for f in *-[0-9]*.txt; do # loop over the target files
base=${f%%-*} # extract the basename
declare -n ref="$prefix$base" # indirect reference to an array named "$base"
ref =("$f") # create a list of filenames for the basename
(( bases[$base] )) # count the number of files for the basename
done
for base in "${!bases[@]}"; do # loop over the basenames
if (( ${bases[$base]} == 2 )); then # check if the number of files are two
declare -n ref="$prefix$base" # indirect reference
IFS=$'\t' read -ra a < "${ref[0]}" # read 1st file and assign array a to the columns
IFS=$'\t' read -ra b < "${ref[1]}" # read 2nd file and assign array b to the columns
if [[ ${a[3]} = ${b[3]} ]]; then # compare the 4th columns
paste "${ref[@]}" > "${base}_merged.txt"
fi
fi
done
- First extract the basenames such as "A", "B", .. then create a list
of associated filenames. For instance, the array "A" will be assigned to
('A-111.txt' 'A-311.txt')
. At the same time, the arraybases
counts the files for each basename. - Then loop over the basenames, make sure the number of associated files are two, compare the 4th columns of the files. If they match, concatenate the files to generate a new file.
paste "${ref[@]}"
concatenates the lines of two files side by side delimited by a tab.