I want to merge the headers of the matrix:
12 12 12 13
bb 2
cc 1
aa 5
ee 6
like this:
12 13
bb 2
cc 1
aa 5
ee 6
I tried this and it didn't work (and it wasn't actually applicable to the larger matrix:
merged_headers=()
for i in {1..3}; do
header=$(head -1 unmerge.txt | awk -v col=$i '{print $col}')
if [ -z "$header" ]; then
header=${merged_headers[-1]}
else
merged_headers =($header)
fi
sed -i "s/^[ \t]*$/$header/g" unmerge.txt
done
CodePudding user response:
Assumptions:
- columns are consolidated in a left-to-right order
- if the column headers are
13 12 14 12 13 14
then the new column headers will be (left-to-right)13 12 14
(as opposed to a numeric or string ordering that would generate12 13 14
) - the consolidated data set will have at most one non-empty value per unique column header; otherwise we'll append them together into a single string; if multiple values are guaranteed to be numeric we could modify the code to sum the values
One awk
idea:
awk '
BEGIN { FS=OFS="\t"
newcolno=1
}
{ printf "%s", $1 # print 1st column
if (NR==1) { # if header record ...
for (old=2; old<=NF; old ) { # loop through columns ...
if (! ($old in newcol)) { # looking for new header and if found ...
printf "%s%s", OFS, $old # print to stdout and ...
newcol[$old]= newcolno # make note of the new column number to map to
}
old2new[old]= newcol[$old] # map current column number to new column number
}
}
else { # non-header rows
delete row # clear our new output array
for (old=2; old<=NF; old ) # loop through current columns ...
row[old2new[old]]=row[old2new[old]] $old # append value to new row/column
for (new=2; new<=newcolno; new ) # loop through new row/columns and ...
printf "%s%s", OFS, row[new] # print to stdout
}
print "" # terminate current line
}
' unmerge.txt
This generates:
12 13
bb 2
cc 1
aa 5
ee 6
Testing a larger file to demonstrate some of our assumptions:
$ cat unmerge2.txt
12 12 12 13 12 13
bb 2
cc 1
aa 5
ee 6
ff 17 87 # distinct headers so no problems
gg 100 -3 # both have header "13" so we'll append the strings
The awk
script generates:
12 13
bb 2
cc 1
aa 5
ee 6
ff 87 17
gg 100-3
Once OP is satisified with the results, and assuming OP still wants to update/overwrite the input file with the new results:
- if using
GNU awk
you can add-i inplace
to facilitate an inplace update of the input file:awk -i inplace 'BEGIN {FS=OFS="\t"; newcolno=1}...' unmerge.txt
- otherwise OP can direct the output to a tmp file and then overwrite the source file with the tmp file:
awk 'BEGIN {FS=OFS="\t"; newcolno=1}...' unmerge.txt > tmpfile; mv tmpfile unmerge.txt