Is there a way to make permutations for file names in a for loop in linux bash?-CodePudding

The idea is that you have 3 text files lets name it A B C where you only have a unique column with strings (doesn't matter the content in this example). What you want is to make a join function between these three, so you'll have a join for A - B another one for B - C and a last one for A - C as if it is a permutation.

Let's make a graphic example. The individual code would be

join -1 1 -2 1 A.txt B.txt > AB.txt

and so on for the other 2

Imagine A has

B has

C has

So A - B comparisson (AB.txt) would be:

101
104

A - C comparisson (AC.txt):

100
104

B - C comparisson (BC.txt):

103
105

And you'll have three output file named afther the comparissons AB.txt, AC.txt and BC.txt

CodePudding user response：

A solution might look like this:

#!/usr/bin/env bash

# Read positional parameters into array
list=("$@")

# Loop over all but the last element
for ((i = 0; i < ${#list[@]} - 1;   i)); do
    # Loop over the elements starting with the first after the one i points to
    for ((j = i   1; j < ${#list[@]};   j)); do
        # Run the join command and redirect to constructed filename
        join "${list[i]}" "${list[j]}" > "${list[i]%.txt}${list[j]%.txt}".txt
    done
done

Notice that the -1 1 -2 1 is the default behaviour for join and can be skipped.

The script has to be called with the filenames as the parameters:

./script A.txt B.txt C.txt

CodePudding user response：

A function that does nothing but generate the possible combinations of two among its arguments:

#!/bin/bash

combpairs() {
    local a b
    until [ $# -lt 2 ]; do
        a="$1"
        for b in "${@:2}"; do
            echo "$a - $b"
        done
        shift
    done
}

combpairs A B C D E
A - B
A - C
A - D
A - E
B - C
B - D
B - E
C - D
C - E
D - E

CodePudding user response：

I would put the files in an array, and use the index like this:

files=(a.txt b.txt c.txt) # or files=(*.txt)

for ((i=0; i<${#files[@]}; i  )); do
    f1=${files[i]} f2=${files[i 1]:-$files}
    join -1 1 -2 1 "$f1" "$f2" > "${f1%.txt}${f2%.txt}.txt"
done

Using echo join to debug (and quoting >), this is what would be executed:

join -1 1 -2 1 a.txt b.txt > ab.txt
join -1 1 -2 1 b.txt c.txt > bc.txt
join -1 1 -2 1 c.txt a.txt > ca.txt

Or for six files:

join -1 1 -2 1 a.txt b.txt > ab.txt
join -1 1 -2 1 b.txt c.txt > bc.txt
join -1 1 -2 1 c.txt d.txt > cd.txt
join -1 1 -2 1 d.txt e.txt > de.txt
join -1 1 -2 1 e.txt f.txt > ef.txt
join -1 1 -2 1 f.txt a.txt > fa.txt

LC_ALL=C; files(*.txt) would use all .txt files in the current directory, sorted by name, which may be relevant.

CodePudding user response：

One in GNU awk:

$ gawk '{
    a[ARGIND][$0]                          # hash all files to arrays
}
END {                                      # after hashing
    for(i in a)                            # form pairs
        for(j in a)
            if(i<j) {                      # avoid self and duplicate comparisons
                f=ARGV[i] ARGV[j] ".txt"   # form output filename
                print ARGV[i],ARGV[j] > f  # output pair info
                for(k in a[i])     
                    if(k in a[j])
                        print k > f        # output matching records
            }
}' a b c

Output, for example:

$ cat ab.txt
a b
101
104

All files are hashed in the memory in the beginning so if the files are huge, you may run out of memory.

CodePudding user response：

Another variation

declare -A seen
for a in {A,B,C}; do 
    for b in {A,B,C}; do
        [[ $a == $b || -v seen[$a$b] || -v seen[$b$a] ]] && continue
        seen[$a$b]=1
        comm -12 "$a.txt" "$b.txt" > "$a$b.txt"
    done
done