Home > Enterprise >  Merge header columns in a matrix in bash
Merge header columns in a matrix in bash

Time:02-01

I want to merge the headers of the matrix:

    12  12  12  13
bb  2   
cc      1               
aa          5
ee              6

like this:

        12     13
bb      2        
cc      1        
aa      5        
ee              6

I tried this and it didn't work (and it wasn't actually applicable to the larger matrix:

merged_headers=()
for i in {1..3}; do
    header=$(head -1 unmerge.txt | awk -v col=$i '{print $col}')
    if [ -z "$header" ]; then
        header=${merged_headers[-1]}
    else
        merged_headers =($header)
    fi
    sed -i "s/^[ \t]*$/$header/g" unmerge.txt
done

CodePudding user response:

Assumptions:

  • columns are consolidated in a left-to-right order
  • if the column headers are 13 12 14 12 13 14 then the new column headers will be (left-to-right) 13 12 14 (as opposed to a numeric or string ordering that would generate 12 13 14)
  • the consolidated data set will have at most one non-empty value per unique column header; otherwise we'll append them together into a single string; if multiple values are guaranteed to be numeric we could modify the code to sum the values

One awk idea:

awk '
BEGIN { FS=OFS="\t"
        newcolno=1
      }
      { printf "%s", $1                                       # print 1st column

        if (NR==1) {                                          # if header record ...
           for (old=2; old<=NF; old  ) {                      # loop through columns ...
               if (! ($old in newcol)) {                      # looking for new header and if found ...
                  printf "%s%s", OFS, $old                    # print to stdout and ...
                  newcol[$old]=   newcolno                    # make note of the new column number to map to
               }
               old2new[old]= newcol[$old]                     # map current column number to new column number
           }
        }
        else {                                                # non-header rows
           delete row                                         # clear our new output array

           for (old=2; old<=NF; old  )                        # loop through current columns ...
               row[old2new[old]]=row[old2new[old]] $old       # append value to new row/column

           for (new=2; new<=newcolno; new  )                  # loop through new row/columns and ...
               printf "%s%s", OFS, row[new]                   # print to stdout
        }

        print ""                                              # terminate current line
      }
' unmerge.txt

This generates:

        12      13
bb      2
cc      1
aa      5
ee              6

Testing a larger file to demonstrate some of our assumptions:

$ cat unmerge2.txt
        12      12      12      13      12      13
bb      2
cc              1
aa                      5
ee                              6
ff                              17      87                # distinct headers so no problems
gg                              100             -3        # both have header "13" so we'll append the strings

The awk script generates:

        12      13
bb      2
cc      1
aa      5
ee              6
ff      87      17
gg              100-3

Once OP is satisified with the results, and assuming OP still wants to update/overwrite the input file with the new results:

  • if using GNU awk you can add -i inplace to facilitate an inplace update of the input file: awk -i inplace 'BEGIN {FS=OFS="\t"; newcolno=1}...' unmerge.txt
  • otherwise OP can direct the output to a tmp file and then overwrite the source file with the tmp file: awk 'BEGIN {FS=OFS="\t"; newcolno=1}...' unmerge.txt > tmpfile; mv tmpfile unmerge.txt
  • Related