I have an output with two columns and I want to replace one of the columns with a string and save it as a .csv file.
For example, the text file:
year, user, tweet
2009, Katie, I love playing football
2010, James, I play football
2013, Bob, I play basketball
2013, James, I play Baseball
The delimiter is ',' and there are 3 tweets with the exact word 'play' and 2 of them are from 2013 and 1 in 2010. Replacing 2013
with Development
and 2011
with Early
.
The output should be:
Early, 1
Development, 2
Then save to a new .csv file
I have only been able to do this so far:
$ awk -F, '{IGNORECASE=1} {ARGC=1} /\<play\>/{a[$1] } END {for (i in a) print i, a[i]}' Tweet.txt | sort
output :
2010 1
2013 2
I have just started learning BASH and would greatly appreciate some help :)
WITH SED
if my original output was like this
2010 1
2013 2010
Meaning that for year 2013, the number of times "play" came up was 2013 times, with sed -e 's/2010/Early/' -e 's/2013/Development/'
The output will be:
Early 1
Development Early
Would you mind helping me out further?
CodePudding user response:
Using awk
awk -F, '$1=="2013" && /play/ {$1="Development"; play ; dev=$1 FS" " play} $1=="2010" && /play/ {$1="Early"; play1 ; early=$1 FS" " play1} NR > 1 && NF == 2; END { print early"\n" dev > "twitter.csv" }' input_file
$ cat replace.awk
BEGIN {
FS="," #Set the field seperator to comma
} $1=="2013" && /play/ { #If column1 is 2013 and any column matches play
$1="Development"; play ; dev=$1 FS" " play #Change column1, count play and create variable
} $1=="2010" && /play/ { #Same as above
$1="Early"; play1 ; early=$1 FS" " play1
} NR > 1 && NF == 2
END {
print early"\n" dev > "twitter.csv" #Print variables seperated by new line
}
Output
$ awk -f replace.awk input_file
Early, 1
Development, 2
CodePudding user response:
pipe this sed
to your attempt
sed -e 's/^2010/Early/' -e 's/^2013/Development/'
full line
awk -F, '{IGNORECASE=1} {ARGC=1} /\<play\>/{a[$1] } END {for (i in a) print i, a[i]}' test.txt | sort | sed -e 's/^2010/Early/' -e 's/^2013/Development/'
the ^
character tells sed to match only 201X at the start of a line.