Test if a value is in a csv file in bash-CodePudding

I have a 3-4M lines csv file (my_csv.csv) with two columns as :

col1,col2
val11,val12
val21,val22
val31,val32
...

The csv contains only two columns with one comma per line. Col1 and Col2 values are only strings (nothing else). The result shown above is the result of the command head my_csv.cs..

I would like to check if a string test_str is into the col2 values. What I mean here is, if test_str = val12 I would like the test to return True because val12 is located in column 2 (as show in the example). But if test_str = val1244 I want the code to return False.

In python it would be something as :

import pandas as pd
df = pd.read_csv('my_csv.csv')
test_str = 'val42'

if test_str in df['col2'].to_list():
    # Expected to return true
    # Do the job

But I have no clues how to do it in bash.

(I know that df['col2'].to_list() is not a good idea, but I didn't want to use built-in pandas function for the code to be easier to understand)

CodePudding user response：

awk is most suited amongst the bash utilities to handle csv data:

awk -F, -v val='val22' '$2 == val {print "found a match:", $0}' file

found a match: val21,val22

Ab equivalent bash loop would be like this:

while IFS=',' read -ra arr; do
   if [[ ${arr[1]} == 'val22' ]]; then
      echo "found a match: ${arr[@]}"
   fi
done < file