Get diff and replace string with awk-CodePudding

I am trying to get the differences between two files with the following awk script:

awk 'NR==FNR{
    a[$0]
    next
}
{
    if($0 in a)
        delete a[$0]
    else 
        a[$0]
}
END {
    for(i in a) {
        $0=i
        sub(/[^.]*$/,substr(tolower($1),1,length($1)-1),$3)
        print
    }
}' [ab].yaml

The a.yaml file:

NAME_VAR: {{ .Data.data.name_var }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}

and the b.yaml file:

NAME_VAR: {{ .Data.data.name_var }}
SOME_VALUE: {{ .Data.data. }}
ONE_MORE: {{ .Data.data. }}
ADD_THIS: {{ .Data.data. }}

the script should merge the differences and replace what is contained in the curly brackets.

Something like this:

ADD_THIS: {{ .Data.data.add_this }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}

But it duplicates my output:

ADD_THIS: {{ .Data.data.add_this }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}

the script should replace everything contained in the braces if there are new variables.

CodePudding user response：

Assumptions:

the data component will always be one of .Data.data.<some_string> or .Data.data.

Taking a slightly different approach:

awk '
NR==FNR { a[$1]=$3
          next
        }
$1 in a { if ($3 == a[$1]) {                   # if $1 is an index of a[] and $3 is an exact match then ...
             delete a[$1]                      # delete the a[] entry (ie, these 2 rows are identical so discard both)
          }
          else
          if (length($3) > length(a[$1]))      # if $1 is an index of a[] but $3 does not match, and $3 is longer then ...
             a[$1]=$3                          # update a[] with the new/longer entry
          next                                 # skip to next input line
        }
        { a[$1]=$3 }                           # if we get here then $1 has not been seen before so add to a[]
END     { for (i in a) {                       # loop throug indices
              val=a[i]                         # make copy of value
              sub(/[^.]*$/,tolower(i),val)     # strip off everything coming after the last period and add the lowercase of our index
              sub(/:$/,"",val)                 # strip the ":" off the end of the index
              print i,"{{",val,"}}"            # print our new output
          }
        }
' [ab].yaml

This generates:

SOME_VALUE: {{ .Data.data.some_value }}
ADD_THIS: {{ .Data.data.add_this }}
ONE_MORE: {{ .Data.data.one_more }}

NOTE: if the output needs to be re-ordered then it's like easier to pipe the results to the appropriate sort command

As for why OP's current code prints duplicate lines ...

Modify the END{...} block like such:

END { for (i in a) 
          print i,a[i]
    }

This should show the code saves the inputs from both files but since no effort is made to match 'duplicates' (via a matching $1) the result is that both sets of inputs (now modified to look identical) are printed to stdout.