Home > Back-end >  Compare two files and output results according to conditionals
Compare two files and output results according to conditionals

Time:02-25

I need to compare two input files (file1.csv and file2.csv) and store the results and a third file (file3.csv).

The conditions are as follows:

  1. If entry matches file1 and file2, store it on file3
  2. If entry is found on file1, store it on file3

It looks like this:

input: file1.csv

"switch10"
"switch33"
"router51"
"switch6"
"router44"
"router12"
"switch2"

input: file2.csv

"router51";"DatacenterA - cab1";"Prod - Tenant12"
"switch33";"DatacenterB - cab14";"Prod - Tenant4"
"switch2";"DatacenterA - cab3";"Dev - Tenant5"
"router44";"DatacenterC - cab2";"Test - Tenant2"

results: file3.csv

"router12"
"router44";"DatacenterC - cab2";"Test - Tenant2"
"router51";"DatacenterA - cab1";"Prod - Tenant12"
"switch2";"DatacenterA - cab3";"Dev - Tenant5"
"switch6"
"switch10"
"switch33";"DatacenterB - cab14";"Prod - Tenant4"

I have tried lots of combinations using 'awk'; however I cannot construct the conditionals.

I was wondering if someone could help me construct the conditional(s) to build this results file3.csv.

Assistance is much appreciated.

CodePudding user response:

I would use GNU AWK for this task following way, let file1.txt content be

"switch10"
"switch33"
"router51"
"switch6"
"router44"
"router12"
"switch2"

and file2.txt content be

"router51";"DatacenterA - cab1";"Prod - Tenant12"
"switch33";"DatacenterB - cab14";"Prod - Tenant4"
"switch2";"DatacenterA - cab3";"Dev - Tenant5"
"router44";"DatacenterC - cab2";"Test - Tenant2"

then

awk 'BEGIN{FS=";"}(NR==FNR){arr[$1]=$0}((NR!=FNR)&&($1 in arr)){arr[$1]=$0}END{for(i in arr){print arr[i]}}' file1.txt file2.txt

output

"switch6"
"router12"
"router44";"DatacenterC - cab2";"Test - Tenant2"
"router51";"DatacenterA - cab1";"Prod - Tenant12"
"switch33";"DatacenterB - cab14";"Prod - Tenant4"
"switch10"
"switch2";"DatacenterA - cab3";"Dev - Tenant5"

Disclaimer: I assume that you are able to accept any order of lines in output file. Explanation: I inform GNU AWK that ; is used as field separator (FS) then when processing first file (NR==FNR) I do fill array arr by putting whole current line ($0) under key which is content of first field ($1). When processing next files (NR!=FNR) if there is value corresponding to first field of line in array arr then I update its value to whole current line. After all files are processed I do print all values from array arr using for.

(tested in gawk 4.2.1)

CodePudding user response:

Similar idea as @Daweo but:

  1. use the file names provided
  2. hard-code file names in script instead of arguments as they are not interchangeable
  3. yield result in requested order

You would save the script as task, chmod 755 task then run it as ./task > file3.csv:

#!/usr/bin/env -S awk -F; -f

BEGIN {
    while ( getline <"file2.csv" ) {
        lookup[$1] = $0
    }
    while ( "sort -k1.1,1.7 -k1.8n file1.csv" | getline ) {
        print ($0 in lookup) ? lookup[$0] : $0
    }
}

Note: sort used here is fragile as it relies on fixed size prefix (router or switch) to identify the two parts of the compound key.

Using gawk's PROCINFO["sorted_in"] feature to return keys in a specify order and using a custom compare function str_num_cmp() makes it a bit more robust:

#!/usr/bin/env -S awk -F; -f

function str_num_to_a(i, a) {
    match(i, /"([^0-9]*)([0-9] )"/, a)
}

function str_num_cmp(i1, v1, i2, v2) {
    str_num_to_a(i1, a1)
    str_num_to_a(i2, a2)
    if(a1[1] < a2[1])
        return -1
    if(a1[1] == a2[1]) {
        if(a1[2] < a2[2])
            return -1
        if(a1[2] == a2[2])
            return 0
    }
    return 1
}

BEGIN {
    while ( getline <"file1.csv" ) {
        keys[$1] = ""
    }
    while ( getline <"file2.csv" ) {
        lookup[$1] = $0
    }
    PROCINFO["sorted_in"] = "str_num_cmp"
    for (k in keys) {
        print (k in lookup) ? lookup[k] : k
    }
}
  • Related