The text file is like this
Year, Name. Date, rating, username, tweet
2009, John, 02/03/09, positive, @johnnyboy, Bob is my best friend
2010, Bob, 01/09/10, positive, @Bob, Bob is cool
I want to be able to print all the dates that has a tweet with the word "Bob" in it (keeping in mind that username can be @Bob
, which I don't want.
So the output should be
02/03/09
01/09/10
So far, my attempt is :
awk -F',' '{IGNORECASE = 1} {ARGC=1} $6=="Bob" {print $3}' Data.txt
I know the obvious mistake is that ==
will return only the dates where the tweet is just Bob
, but my attempts have all be futile and that is the closest I could get to. Are there any other way to do this using awk
?
Thank you
CodePudding user response:
As tweet column may contain comma (,), you cannot use $6 directly :
awk -F',' '{IGNORECASE = 1} {ARGC=1} {col3=$3;$1=$2=$3=$4=$5=""; if (/Bob/) print col3}' Data.txt
col3=$3
saves the third column in the variablecol3
$1=$2=$3=$4=$5=""
removed the first 5 columns(/Bob/) compare the rest of columns against regular expression "Bob", because you could have a tweet like
Hi, Bob is my best friend