Home > Mobile >  Need to combine 2 files having different word list size
Need to combine 2 files having different word list size

Time:02-26

I need to combine 2 files bash script that are having word list with different word count and I want to combine them as shown below.

File 1:

word1
word2
word3

File 2:

8.8.8.8
4.4.4.4
4.4.2.2
5.5.5.5

Desired Output:

word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5

Hoping to get some help from experts.

CodePudding user response:

Find a high enough field number (like 100) that is not contained in your files and (ab)use join to produce the cartesian product

join -j 100 file1.txt file2.txt
 word1 8.8.8.8
 word1 4.4.4.4
 word1 4.4.2.2
 word1 5.5.5.5
 word2 8.8.8.8
 word2 4.4.4.4
 word2 4.4.2.2
 word2 5.5.5.5
 word3 8.8.8.8
 word3 4.4.4.4
 word3 4.4.2.2
 word3 5.5.5.5

Edit: In order to have a comma as column separator, name it using the -t option, and to have the output not start with that separator (previously a space, now the comma), make the ordering explicit using the -o option:

join -j 100 -t, -o 1.1,2.1 file1.txt file2.txt
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5

CodePudding user response:

You can simplify and gain flexibility by using awk to read both files values into separate indexed arrays and then in the END rule, simply loop over the stored values outputting in the format you desire, e.g.

awk '
  FNR==NR { f1[  n] = $0; next }        # save file_1 in array f1
  { f2[  m] = $0 }                      # save file_2 in array f2
  END {
    for (i=1; i<=n; i  )                # loop over all f1 values
      for(j=1; j<=m; j  )               # loop over all f2 values
        printf "%s,%s\n", f1[i], f2[j]  # output f1[],f2[]
  }
' file_1 file_2

Example Use/Output

With your data in file_1 and file_2 you would have:

$ awk '
>   FNR==NR { f1[  n] = $0; next }        # save file_1 in array f1
>   { f2[  m] = $0 }                      # save file_2 in array f2
>   END {
>     for (i=1; i<=n; i  )                # loop over all f1 values
>       for(j=1; j<=m; j  )               # loop over all f2 values
>         printf "%s,%s\n", f1[i], f2[j]  # output f1[],f2[]
>   }
> ' file_1 file_2
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5

Using Bash

You can do the exact same thing in a bash script reading both files into array using readarray (synonym to mapfile), e.g.

#!/bin/bash

usage() {  ## simple function to output error and usage
  [ -n "$1" ] && printf "error: %s\n" "$1"
  printf "usage: %s file_1 file_2\n" "${0##*/}"
}

## validate filenames provided in first 2 arguments exist and are non-empty
[ -s "$1" ] || { usage "file $1 not found or empty"; exit 1; }
[ -s "$2" ] || { usage "file $2 not found or empty"; exit 1; }

readarray -t f1 < "$1"    # read file_1 int array f1
readarray -t f2 < "$2"    # read file_2 int array f2

for i in "${f1[@]}"; do         ## loop over f1
  for j in "${f2[@]}"; do       ## loop over f2
    printf "%s,%s\n" "$i" "$j"  ## output combined result
  done
done

(note: awk will likely provide better performance)

Example Use/Output

With the script saved as cmbfiles.sh you would have:

$ bash cmbfiles.sh file_1 file_2
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5

CodePudding user response:

Would you please try the following:

awk -v OFS="," -v ORS="\r\n" '                  # set comma as field separator, CRLF as record separator
    NR==FNR && NF>0 {a[  n]=$0; next}           # read file2.txt skipping blang lines
    NF>0 {for (i=1; i<=n; i  ) print $0, a[i]}  # print line of file1.txt appending the lines of file2.txt
' file2.txt file1.txt
  • It skips blank lines in the input file.
  • It appends Windows line endings considering to be opened with Excel.
  • Related