I need to combine 2 files bash script that are having word list with different word count and I want to combine them as shown below.
File 1:
word1
word2
word3
File 2:
8.8.8.8
4.4.4.4
4.4.2.2
5.5.5.5
Desired Output:
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5
Hoping to get some help from experts.
CodePudding user response:
Find a high enough field number (like 100) that is not contained in your files and (ab)use join
to produce the cartesian product
join -j 100 file1.txt file2.txt
word1 8.8.8.8
word1 4.4.4.4
word1 4.4.2.2
word1 5.5.5.5
word2 8.8.8.8
word2 4.4.4.4
word2 4.4.2.2
word2 5.5.5.5
word3 8.8.8.8
word3 4.4.4.4
word3 4.4.2.2
word3 5.5.5.5
Edit: In order to have a comma as column separator, name it using the -t
option, and to have the output not start with that separator (previously a space, now the comma), make the ordering explicit using the -o
option:
join -j 100 -t, -o 1.1,2.1 file1.txt file2.txt
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5
CodePudding user response:
You can simplify and gain flexibility by using awk
to read both files values into separate indexed arrays and then in the END
rule, simply loop over the stored values outputting in the format you desire, e.g.
awk '
FNR==NR { f1[ n] = $0; next } # save file_1 in array f1
{ f2[ m] = $0 } # save file_2 in array f2
END {
for (i=1; i<=n; i ) # loop over all f1 values
for(j=1; j<=m; j ) # loop over all f2 values
printf "%s,%s\n", f1[i], f2[j] # output f1[],f2[]
}
' file_1 file_2
Example Use/Output
With your data in file_1
and file_2
you would have:
$ awk '
> FNR==NR { f1[ n] = $0; next } # save file_1 in array f1
> { f2[ m] = $0 } # save file_2 in array f2
> END {
> for (i=1; i<=n; i ) # loop over all f1 values
> for(j=1; j<=m; j ) # loop over all f2 values
> printf "%s,%s\n", f1[i], f2[j] # output f1[],f2[]
> }
> ' file_1 file_2
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5
Using Bash
You can do the exact same thing in a bash script reading both files into array using readarray
(synonym to mapfile
), e.g.
#!/bin/bash
usage() { ## simple function to output error and usage
[ -n "$1" ] && printf "error: %s\n" "$1"
printf "usage: %s file_1 file_2\n" "${0##*/}"
}
## validate filenames provided in first 2 arguments exist and are non-empty
[ -s "$1" ] || { usage "file $1 not found or empty"; exit 1; }
[ -s "$2" ] || { usage "file $2 not found or empty"; exit 1; }
readarray -t f1 < "$1" # read file_1 int array f1
readarray -t f2 < "$2" # read file_2 int array f2
for i in "${f1[@]}"; do ## loop over f1
for j in "${f2[@]}"; do ## loop over f2
printf "%s,%s\n" "$i" "$j" ## output combined result
done
done
(note: awk
will likely provide better performance)
Example Use/Output
With the script saved as cmbfiles.sh
you would have:
$ bash cmbfiles.sh file_1 file_2
word1,8.8.8.8
word1,4.4.4.4
word1,4.4.2.2
word1,5.5.5.5
word2,8.8.8.8
word2,4.4.4.4
word2,4.4.2.2
word2,5.5.5.5
word3,8.8.8.8
word3,4.4.4.4
word3,4.4.2.2
word3,5.5.5.5
CodePudding user response:
Would you please try the following:
awk -v OFS="," -v ORS="\r\n" ' # set comma as field separator, CRLF as record separator
NR==FNR && NF>0 {a[ n]=$0; next} # read file2.txt skipping blang lines
NF>0 {for (i=1; i<=n; i ) print $0, a[i]} # print line of file1.txt appending the lines of file2.txt
' file2.txt file1.txt
- It skips blank lines in the input file.
- It appends Windows line endings considering to be opened with Excel.