Home > front end >  How to sort inside a cell captured by awk
How to sort inside a cell captured by awk

Time:01-11

I have a file with rows like following, where 3rd column has multiple numeric values which I need to sort:

file: h1.csv

Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3348-1-27681 3347-1-28453
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3345-3-25691 3343-3-25314
Class S101-T2;3343-4-25310;3345-4-25691 3343-4-25314 3344-4-25314
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

So, expected output is:

Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

My idea was to capture 3rd column with awk and the sort it, and finally print output, but I have arrived only to capture the column. I have not succeeded in sorting it, nor printing disired output.

Here's the code I've got so far...

cat h1.csv | awk -F';' '{ gsub(" ","\n",$3); print $0 }'

I have tried (and some others giving error):

cat h1.csv | awk -F';' '{ gsub(" ","\n",$3); print $3 | "sort -u" }'
cat h1.csv | awk -F';' '{ gsub(" ","\n",$3); sort -u; print $3 }'

So, is it possible to do so, how?, any help! Thanks...

CodePudding user response:

One option could be to split the 3rd column on a space, and then sort the values.

Then concatenate the first 2 fields and the splitted and sorted fields again.

awk '
BEGIN{FS=OFS=";"}
{
  n=split($3, a, " ")
  asort(a)
  res = $1 OFS $2 OFS
  for (i = 1; i <= n; i  ) {
    res = res " " a[i]
  }
  print res
}' file

Output

Class S101-T1;3343-1-25310; 3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310; 3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310; 3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310; 3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310; 3344-5-25446 3345-5-25691

CodePudding user response:

In GNU awk, with your shown samples, please try following awk code.

awk '
BEGIN{
  FS=OFS=";"
  PROCINFO["sorted_in"] = "@val_num_asc"
}
{
  nf=val=""
  delete value
  num=split($NF,arr," ")
  for(i=1;i<=num;i  ){
    split(arr[i],arr2,"-")
    value[arr2[1]]=arr[i]
  }
  for(i in value){
    nf=(nf?nf " ":"")value[i]
  }
  $NF=nf
}
1
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                     ##Starting awk program from here.
BEGIN{                                    ##Starting BEGIN section from here.
  FS=OFS=";"                              ##Setting FS, OFS as ; here.
  PROCINFO["sorted_in"] = "@val_num_asc"  ##Setting PROCINFO using sorted_in to make sure array values are sorted by values in ascending order only.
}
{
  nf=val=""                               ##Nullifying variables here.
  delete value                            ##Deleting value array here.
  num=split($NF,arr," ")                  ##Splitting last field into arr with separator as space here.
  for(i=1;i<=num;i  ){                    ##Traversing through all elements of array arr.
    split(arr[i],arr2,"-")                ##Splitting first value of arr into arr2 by delimiter of - to make sure to get only first value eg: 3344, 3345 etc.
    value[arr2[1]]=arr[i]                 ##Assigning value array value to arr value with index of arr2 value whose index of 1st.
  }
  for(i in value){                        ##Traversing through array value here.
    nf=(nf?nf " ":"")value[i]             ##Concatenating all values to nf here.
  }
  $NF=nf                                  ##Assigning last field value to nf here.
}
1                                         ##printing edited/non-edited line here.
'  Input_file                             ##Mentioning Input_file name here.
  •  Tags:  
  • Related