I have a 3-4M lines csv file (my_csv.csv) with two columns as :
col1,col2
val11,val12
val21,val22
val31,val32
...
The csv contains only two columns with one comma per line. Col1 and Col2 values are only strings (nothing else). The result shown above is the result of the command head my_csv.cs
..
I would like to check if a string test_str
is into the col2 values. What I mean here is, if test_str = val12
I would like the test to return True
because val12
is located in column 2 (as show in the example).
But if test_str = val1244
I want the code to return False
.
In python it would be something as :
import pandas as pd
df = pd.read_csv('my_csv.csv')
test_str = 'val42'
if test_str in df['col2'].to_list():
# Expected to return true
# Do the job
But I have no clues how to do it in bash.
(I know that df['col2'].to_list()
is not a good idea, but I didn't want to use built-in pandas function for the code to be easier to understand)
CodePudding user response:
awk
is most suited amongst the bash utilities to handle csv
data:
awk -F, -v val='val22' '$2 == val {print "found a match:", $0}' file
found a match: val21,val22
Ab equivalent bash
loop would be like this:
while IFS=',' read -ra arr; do
if [[ ${arr[1]} == 'val22' ]]; then
echo "found a match: ${arr[@]}"
fi
done < file