The idea is that you have 3 text files lets name it A B C where you only have a unique column with strings (doesn't matter the content in this example). What you want is to make a join function between these three, so you'll have a join for A - B another one for B - C and a last one for A - C as if it is a permutation.
Let's make a graphic example. The individual code would be
join -1 1 -2 1 A.txt B.txt > AB.txt
and so on for the other 2
Imagine A has
100
101
102
104
B has
101
103
104
105
C has
100
103
104
105
So A - B comparisson (AB.txt) would be:
101
104
A - C comparisson (AC.txt):
100
104
B - C comparisson (BC.txt):
103
105
And you'll have three output file named afther the comparissons AB.txt, AC.txt and BC.txt
CodePudding user response:
A solution might look like this:
#!/usr/bin/env bash
# Read positional parameters into array
list=("$@")
# Loop over all but the last element
for ((i = 0; i < ${#list[@]} - 1; i)); do
# Loop over the elements starting with the first after the one i points to
for ((j = i 1; j < ${#list[@]}; j)); do
# Run the join command and redirect to constructed filename
join "${list[i]}" "${list[j]}" > "${list[i]%.txt}${list[j]%.txt}".txt
done
done
Notice that the -1 1 -2 1
is the default behaviour for join
and can be skipped.
The script has to be called with the filenames as the parameters:
./script A.txt B.txt C.txt
CodePudding user response:
A function that does nothing but generate the possible combinations of two among its arguments:
#!/bin/bash
combpairs() {
local a b
until [ $# -lt 2 ]; do
a="$1"
for b in "${@:2}"; do
echo "$a - $b"
done
shift
done
}
combpairs A B C D E
A - B
A - C
A - D
A - E
B - C
B - D
B - E
C - D
C - E
D - E
CodePudding user response:
I would put the files in an array, and use the index like this:
files=(a.txt b.txt c.txt) # or files=(*.txt)
for ((i=0; i<${#files[@]}; i )); do
f1=${files[i]} f2=${files[i 1]:-$files}
join -1 1 -2 1 "$f1" "$f2" > "${f1%.txt}${f2%.txt}.txt"
done
Using echo join
to debug (and quoting >
), this is what would be executed:
join -1 1 -2 1 a.txt b.txt > ab.txt
join -1 1 -2 1 b.txt c.txt > bc.txt
join -1 1 -2 1 c.txt a.txt > ca.txt
Or for six files:
join -1 1 -2 1 a.txt b.txt > ab.txt
join -1 1 -2 1 b.txt c.txt > bc.txt
join -1 1 -2 1 c.txt d.txt > cd.txt
join -1 1 -2 1 d.txt e.txt > de.txt
join -1 1 -2 1 e.txt f.txt > ef.txt
join -1 1 -2 1 f.txt a.txt > fa.txt
LC_ALL=C; files(*.txt)
would use all .txt
files in the current directory, sorted by name, which may be relevant.
CodePudding user response:
One in GNU awk:
$ gawk '{
a[ARGIND][$0] # hash all files to arrays
}
END { # after hashing
for(i in a) # form pairs
for(j in a)
if(i<j) { # avoid self and duplicate comparisons
f=ARGV[i] ARGV[j] ".txt" # form output filename
print ARGV[i],ARGV[j] > f # output pair info
for(k in a[i])
if(k in a[j])
print k > f # output matching records
}
}' a b c
Output, for example:
$ cat ab.txt
a b
101
104
All files are hashed in the memory in the beginning so if the files are huge, you may run out of memory.
CodePudding user response:
Another variation
declare -A seen
for a in {A,B,C}; do
for b in {A,B,C}; do
[[ $a == $b || -v seen[$a$b] || -v seen[$b$a] ]] && continue
seen[$a$b]=1
comm -12 "$a.txt" "$b.txt" > "$a$b.txt"
done
done