Getting last X fields from a specific line in a CSV file using bash-CodePudding

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.

Example CSV file:

"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"

I would like to get something like

list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4

I'm trying this:

cat file.csv | grep "record2_data2" |  awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'

My result is:

user2,user3,user4,,

Question: How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,, Can I do it in better way? Users always starts after 3rd column in my file.

CodePudding user response：

This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):

$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4

but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.

CodePudding user response：

Better to use a bash array, and join it into a CSV string when needed:

#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"

cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.

CodePudding user response：

GNU sed solution, let file.csv content be

"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"

then

sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv

gives output

user1,user2,user3,user4

Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.

(tested in GNU sed 4.2.2)

CodePudding user response：

awk -F',' '
  /record2_data2/{
     for(i=4;i<=NF;i  ) o=sprintf("%s%s,",o,$i); 
     gsub(/"|,$/,"",o); 
     print o
}' file.csv

user1,user2,user3,user4