Home > Enterprise >  Display row/column data for csv with max value in another column, same row (bash)
Display row/column data for csv with max value in another column, same row (bash)

Time:06-12

I'm trying to make a script that sorts column 2 for highest value, prints said value, and prints column 3 for every row matching this value. Here's an example of unsorted csv:

Argentina,4.6,2016,some data
Argentina,4.2,2018,some data
Argentina,4.6,1998,some data
Argentina,4.5,2001,some data

Desired output would be:

4.6
2016
1998

Here's what I've got so far, but I'm feeling unsure if I'm going about it correctly:

grep  "$2*" "$1"> new.csv

sort -t, -k2,2nr new.csv > new2.csv

cut -f3 -d"," new2.csv

Wherein $2 is the name of country in first column and $1 is the filename. While it sorts the values in the 2nd column just fine, I'd like to show the years for only the rows with max value in column 2. This route just prints the years for all of the rows, and I understand why that's happening, but not sure the best course to get the intended result from there. What are some ways to go about this? Thanks in advance

CodePudding user response:

You could do something like that:

declare maxvalue_found=no
declare maxvalue=''
while read -r line; do
  IFS=',' read -r <<< "$line" country value year data

  if [[ "${maxvalue_found}" == no ]]; then
    echo "$value"
    maxvalue="${value}"
    maxvalue_found=yes
  fi
  if [[ "${value}" == "${maxvalue}" ]]; then
    echo "$year"
  fi
done < new2.csv

new2.csv is your sorted file: we simply read it line by line, then read said line by splitting using ',' (https://www.gnu.org/software/bash/manual/bash.html#Word-Splitting):

  • The first value should be the highest due to sort.
  • The next value must be tested because you want only those that matches.
  • The year are print in same order than in new2.csv

CodePudding user response:

Assumptions:

  • comma only shows up as a field delimiter (ie, comma is not part of any data)
  • no sorting requirement has been defined for the final result

One awk idea requiring 2 passes through the unsorted file:

awk -F, '                                     # set input field delimiter as comma
FNR==NR { max=($2>max ? $2 : max); next}      # 1st pass of file (all rows): keep track of max value from field #2
FNR==1  { print max }                         # 2nd pass of file (1st row ): print max
$2==max { print $3 }                          # 2nd pass of file (all rows): if field #2 matches "max" then print field #3
' unsorted.csv unsorted.csv

This generates:

4.6
2016
1998

Another GNU awk idea that requires a single pass through the unsorted file:

awk -F, '                                     # set input field delimiter as comma
    { arr[$2][$3]                             # save fields #2 and #3 as indices in array "arr[]"
      max = ( $2 > max ? $2 : max)            # keep track of max value from field #2
    }
END { print max                               # after file has been processed ... print max and then ...
      for (i in arr[max])                     # loop through indices of 2nd dimension where 1st dimension == max
          print i                             # print 2nd dimension index (ie, field #3)
    }
' unsorted.csv

This generates:

4.6
1998
2016

NOTES:

  • GNU awk required for arrays of arrays (ie, multidimensional arrays)
  • while field #3 appeara to be sorted this is not guaranteed unless we modify the code to explicitly sort the 2nd dimension of the array
  • Related