I am trying to get the differences between two files with the following awk script:
awk 'NR==FNR{
a[$0]
next
}
{
if($0 in a)
delete a[$0]
else
a[$0]
}
END {
for(i in a) {
$0=i
sub(/[^.]*$/,substr(tolower($1),1,length($1)-1),$3)
print
}
}' [ab].yaml
The a.yaml
file:
NAME_VAR: {{ .Data.data.name_var }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
and the b.yaml
file:
NAME_VAR: {{ .Data.data.name_var }}
SOME_VALUE: {{ .Data.data. }}
ONE_MORE: {{ .Data.data. }}
ADD_THIS: {{ .Data.data. }}
the script should merge the differences and replace what is contained in the curly brackets.
Something like this:
ADD_THIS: {{ .Data.data.add_this }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
But it duplicates my output:
ADD_THIS: {{ .Data.data.add_this }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
SOME_VALUE: {{ .Data.data.some_value }}
ONE_MORE: {{ .Data.data.one_more }}
the script should replace everything contained in the braces if there are new variables.
CodePudding user response:
Assumptions:
- the data component will always be one of
.Data.data.<some_string>
or.Data.data.
Taking a slightly different approach:
awk '
NR==FNR { a[$1]=$3
next
}
$1 in a { if ($3 == a[$1]) { # if $1 is an index of a[] and $3 is an exact match then ...
delete a[$1] # delete the a[] entry (ie, these 2 rows are identical so discard both)
}
else
if (length($3) > length(a[$1])) # if $1 is an index of a[] but $3 does not match, and $3 is longer then ...
a[$1]=$3 # update a[] with the new/longer entry
next # skip to next input line
}
{ a[$1]=$3 } # if we get here then $1 has not been seen before so add to a[]
END { for (i in a) { # loop throug indices
val=a[i] # make copy of value
sub(/[^.]*$/,tolower(i),val) # strip off everything coming after the last period and add the lowercase of our index
sub(/:$/,"",val) # strip the ":" off the end of the index
print i,"{{",val,"}}" # print our new output
}
}
' [ab].yaml
This generates:
SOME_VALUE: {{ .Data.data.some_value }}
ADD_THIS: {{ .Data.data.add_this }}
ONE_MORE: {{ .Data.data.one_more }}
NOTE: if the output needs to be re-ordered then it's like easier to pipe the results to the appropriate sort
command
As for why OP's current code prints duplicate lines ...
Modify the END{...}
block like such:
END { for (i in a)
print i,a[i]
}
This should show the code saves the inputs from both files but since no effort is made to match 'duplicates' (via a matching $1
) the result is that both sets of inputs (now modified to look identical) are printed to stdout.