Home > Back-end >  Replacing a column of an output with another name and saving it as .CSV file
Replacing a column of an output with another name and saving it as .CSV file


I have an output with two columns and I want to replace one of the columns with a string and save it as a .csv file.

For example, the text file:

year, user, tweet
2009, Katie, I love playing football
2010, James, I play football
2013, Bob, I play basketball
2013, James, I play Baseball

The delimiter is ',' and there are 3 tweets with the exact word 'play' and 2 of them are from 2013 and 1 in 2010. Replacing 2013 with Development and 2011 with Early.

The output should be:

Early, 1
Development, 2

Then save to a new .csv file

I have only been able to do this so far:

$ awk -F, '{IGNORECASE=1} {ARGC=1} /\<play\>/{a[$1]  } END {for (i in a) print i, a[i]}' Tweet.txt | sort

output :

2010 1
2013 2

I have just started learning BASH and would greatly appreciate some help :)


if my original output was like this

2010 1
2013 2010

Meaning that for year 2013, the number of times "play" came up was 2013 times, with sed -e 's/2010/Early/' -e 's/2013/Development/' The output will be:

Early 1
Development Early

Would you mind helping me out further?

CodePudding user response:

Using awk

awk  -F, '$1=="2013" && /play/ {$1="Development"; play  ; dev=$1 FS" " play} $1=="2010" && /play/ {$1="Early"; play1  ; early=$1 FS" " play1} NR > 1 && NF == 2; END { print early"\n" dev > "twitter.csv" }' input_file
$ cat replace.awk

    FS=","                                        #Set the field seperator to comma
} $1=="2013" && /play/ {                          #If column1 is 2013 and any column matches play
    $1="Development"; play  ; dev=$1 FS" " play   #Change column1, count play and create variable
} $1=="2010" && /play/ {                          #Same as above 
    $1="Early"; play1  ; early=$1 FS" " play1   
} NR > 1 && NF == 2 
END { 
    print early"\n" dev  > "twitter.csv"          #Print variables seperated by new line


$ awk -f replace.awk input_file
Early, 1
Development, 2

CodePudding user response:

pipe this sed to your attempt

sed -e 's/^2010/Early/' -e 's/^2013/Development/'

full line

awk -F, '{IGNORECASE=1} {ARGC=1} /\<play\>/{a[$1]  } END {for (i in a) print i, a[i]}' test.txt | sort | sed -e 's/^2010/Early/' -e 's/^2013/Development/'

the ^ character tells sed to match only 201X at the start of a line.

  • Related