Home > database >  Sort bash array of one line CSVs based on first column of each CSV in array
Sort bash array of one line CSVs based on first column of each CSV in array

Time:10-03

I have a bash array of one line CSVs that I want to put in a numerically increasing order based upon the first column in the CSV.

My array variable looks like this:

136,1.1.1.1,1234567890 117,2.2.2.2,0987654321 121,3.3.3.3,6789054321 112,4.4.4.4,1234567809 104,5.5.5.5,0987654312 107,6.6.6.6,6574839201 70.3,7.7.7.7,0987123456 162,8.8.8.8,0123498765 141,9.9.9.9,5432187690 80.3,10.10.10.10,1209568743 142,11.11.11.11,1234567899 92.0,12.12.12.12,0987654322 83.4,13.13.13.13,0987651233 102,14.14.14.14,0912837455 141,15.15.15.15,0912345677

I want to be able to format the above array (or any other of the same structure) into this numerical order:

70.3,7.7.7.7,0987123456 80.3,10.10.10.10,1209568743 83.4,13.13.13.13,0987651233 92.0,12.12.12.12,0987654322 102,14.14.14.14,0912837455 104,5.5.5.5,0987654312 107,6.6.6.6,6574839201 112,4.4.4.4,1234567809 117,2.2.2.2,0987654321 121,3.3.3.3,6789054321 136,1.1.1.1,1234567890 141,9.9.9.9,5432187690 141,15.15.15.15,0912345677 142,11.11.11.11,1234567899 162,8.8.8.8,0123498765

I have tried piping the entire array into the sort command echo ${array[*]} | sort -k1 -n -t,, but that doesn't work on single line data sets.

I am aware that I could simply dump the data into a file and use sort, that was the method I was using before, but I need to avoid writing to disk entirely. My previous approach was with this code (it takes in a multi-line variable of the one line CSVs):

while IFS= read -r line
do
    ipaddr=$(echo $line | cut -d"," -f1)
    value3=$(echo $line | cut -d"," -f2)
    latency=$(ping -c1 -W5 -4n $ipaddr | grep -o "time=.*" | sed 's/time=//g' | sed 's/ ms//g')
    echo $latency,$ipaddr,$value3 >> /tmp/.sort
done < <(printf '%s\n' "$input_data")
sort -k1 -n -t, /tmp/.sort

I can't seem to find anything online about sorting arrays based upon values in each array item.

Does anyone have a clever way of accomplishing this?

CodePudding user response:

Your data has spaces it in. You can use that to split on spaces leaving you with multiple lines of input with the numeric value first which you can then sort with sort -n. You can use sed or tr or even printf (as mentioned by Fravadona) to split the single lines of input.

For example, take your first data and simply use tr to translate spaces to newlines and then sort, e.g.

$ tr ' ' '\n' <<< "136,value1,value2,value3 115,value1,value2,value3 125,value1,value2,value3 111,value1,value2,value3 103,value1,value2,value3 108,value1,value2,value3 70.5,value1,value2,value3 166,value1,value2,value3 142,value1,value2,value3" | 
sort -n
70.5,value1,value2,value3
103,value1,value2,value3
108,value1,value2,value3
111,value1,value2,value3
115,value1,value2,value3
125,value1,value2,value3
136,value1,value2,value3
142,value1,value2,value3
166,value1,value2,value3

There are multiple ways to do it, choose the one that fits your needs.

If your "value1,value2,value3" are numeric and would interfere with a numeric sort of the entire record, limit the sort to the first field separated with ',', e.g.

sort -t, -k1,1n

Which will provide a numeric sort based on the first-field only limiting the comparison from the start of field 1 to the end of field 1. Using the data in the comment, you would have:

$ tr ' ' '\n' <<< "136,1.1.1.1,1234567890 117,2.2.2.2,0987654321 121,3.3.3.3,6789054321 112,4.4.4.4,1234567809 104,5.5.5.5,0987654312 107,6.6.6.6,6574839201 70.3,7.7.7.7,0987123456 162,8.8.8.8,0123498765 141,9.9.9.9,5432187690 80.3,10.10.10.10,1209568743 142,11.11.11.11,1234567899 92.0,12.12.12.12,0987654322 83.4,13.13.13.13,0987651233 102,14.14.14.14,0912837455 141,15.15.15.15,0912345677" | 
sort -t, -k1,1n
70.3,7.7.7.7,0987123456
80.3,10.10.10.10,1209568743
83.4,13.13.13.13,0987651233
92.0,12.12.12.12,0987654322
102,14.14.14.14,0912837455
104,5.5.5.5,0987654312
107,6.6.6.6,6574839201
112,4.4.4.4,1234567809
117,2.2.2.2,0987654321
121,3.3.3.3,6789054321
136,1.1.1.1,1234567890
141,15.15.15.15,0912345677
141,9.9.9.9,5432187690
142,11.11.11.11,1234567899
162,8.8.8.8,0123498765

Let me know if that isn't the desired output you are looking for.

CodePudding user response:

Looking at your code, you don't need to use an array in the first place, just use a pipeline.

Also, your sorting method is a bit off:

  • Based on the ordering of the records starting with "141", you want a stable sort.
  • From some quick testing on my Ubuntu system in French, the "time=" number is not localized, so specify LC_ALL=C. (BTW, it's worth noting that the label is localized, e.g. "temps=".)

Here I'll use the processed data in place of $input_data, just to show how it'd work:

processed_data='136,1.1.1.1,1234567890
117,2.2.2.2,0987654321
121,3.3.3.3,6789054321
112,4.4.4.4,1234567809
104,5.5.5.5,0987654312
107,6.6.6.6,6574839201
70.3,7.7.7.7,0987123456
162,8.8.8.8,0123498765
141,9.9.9.9,5432187690
80.3,10.10.10.10,1209568743
142,11.11.11.11,1234567899
92.0,12.12.12.12,0987654322
83.4,13.13.13.13,0987651233
102,14.14.14.14,0912837455
141,15.15.15.15,0912345677'

printf '%s\n' "$processed_data" |
    while IFS=, read -r latency ipaddr value3
do
    # Actual processing goes here, i.e. `cut` and `ping`.

    echo "$latency,$ipaddr,$value3"
done |
    LC_ALL=C sort -k1 -n -t, -s

Output:

70.3,7.7.7.7,0987123456
80.3,10.10.10.10,1209568743
83.4,13.13.13.13,0987651233
92.0,12.12.12.12,0987654322
102,14.14.14.14,0912837455
104,5.5.5.5,0987654312
107,6.6.6.6,6574839201
112,4.4.4.4,1234567809
117,2.2.2.2,0987654321
121,3.3.3.3,6789054321
136,1.1.1.1,1234567890
141,9.9.9.9,5432187690
141,15.15.15.15,0912345677
142,11.11.11.11,1234567899
162,8.8.8.8,0123498765
  •  Tags:  
  • bash
  • Related