I want to merge the headers of the matrix (FS is tab):
12 12 12 13
bb 2
cc 8 3
aa 5
ee 6
like this:
12 12 13
bb 2
cc 8 3
aa 5
ee 6
I tried this
awk 'BEGIN{FS=OFS="\t";maxcolno=1} {printf "%s",$1;if(NR==1){for(oldi=2;oldi<=NF;oldi ){if(!($oldi in newcolno)){printf "%s%s",OFS,$oldi;newcolno[$oldi]= maxcolno;}old2new[oldi]=newcolno[$oldi];}}else{delete row;for(oldi=2;oldi<=NF;oldi )row[old2new[oldi]]=row[old2new[oldi]]$oldi;for(newi=2;newi<=maxcolno;newi )printf "%s%s",OFS,row[newi];}print""}' unmerge.txt > merge.txt
but it forms the following table which is not desired:
12 13
bb 2
cc 83
aa 5
ee 6
CodePudding user response:
Assumptions:
- values are to be left-shifted within a row when there is an empty space (to the left) in a column of the same label; this means the
aa / 5
value should be shifted to the 1st12
column (as opposed to the 2nd12
column as in OP's expected output)
General design:
- populate a matrix with the input data
- as we process a data row we determine the left-most column in which to shift/place a value
- in the
END{}
block we remove empty columns and then print the remaining matrix
One awk
idea:
awk '
BEGIN { FS=OFS="\t" }
NR==1 { matrix[NR][1]=$1
for (i=2;i<=NF;i ) {
matrix[NR][i]=$i
lab2col[$i][ labcnt[$i]]=i # keep track of list of physical columns that a particular label is associated with
}
next
}
{ matrix[NR][1]=$1
delete labcnt
for (i=2;i<=NF;i ) # loop through input fields and ...
if ($i) { # if non-empty then shift to the left-most column with the same header/label
matrix[NR][lab2col[matrix[1][i]][ labcnt[matrix[1][i]]]]=$i
# ^^^^^^^^^^^^ - label at top of current field
# ^^^^^^^^^^^^^^^^^^^^^^ - number of times we have seen this label in this line
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - physical column to map this nth occurrence (of this label) to
}
}
END { # find/remove empty columns
for (j=2;j<=NF;j ) { # loop through list of data columns
valcnt=0 # initialize non-empty counter
for (i=2;i<=NR;i ) # loop through data rows
valcnt = (matrix[i][j] ? 1 : 0) # keep count of non-empty matrix values
if (valcnt==0) # if all rows in this column are empty then ...
delete matrix[1][j] # delete the column index from the header/1st row of the matrix
}
PROCINFO["sorted_in"]="@ind_num_asc" # make sure we process indices in ascending numerical order
for (i=1;i<=NR;i ) { # loop through rows
for (j in matrix[1]) { # loop through columns (that still exist in the 1st row of the matrix)
printf "%s%s", (j==1 ? "" : OFS), matrix[i][j] # print matrix entry
pfx=OFS
}
print "" # terminate current line of output
}
}
' unmerge.txt
NOTE: requires GNU awk
for:
- multi-dimensional arrays (aka array of arrays)
- the
PROCINFO["sorted_in"]
feature
This generates:
12 12 13
bb 2
cc 8 3
aa 5
ee 6
Expanding the input a bit:
$ cat unmerge2.txt
12 12 12 13 12 13
bb 2
cc 8 3
aa 5
ee 6
ff 17 87
gg 100 -3
The awk
script generates:
12 12 13 13
bb 2
cc 8 3
aa 5
ee 6
ff 87 17
gg 100 -3