I want to save data, based on column values. Below is the data set that I'm working on
created_at,tweet,category
7/25/2021,Great Sunny day for Cricket at London Great Score put on by England batting Olympic is to held in Japan,sports
7/25/2021,President Made a clear statement An election is to be kept next year,politics
7/25/2021,A terrorist attack have killed 10 people,crime
7/26/2021,Srilanka have lost the T20 series Australia have won the series,sports
7/26/2021,Minister have given up his role last monday President is challenging the opposite leader,politics
7/27/2021,Rainy day for Cricket at London poor Score put on by zimabwe batting next Olympic is to held in Srilanka,sports
7/27/2021,President Made a poor statement No any election is to be kept next year,politics
7/27/2021,100 of people are being killed due to terror attack,crime
7/28/2021,IPL will be happening next year Velentino Rossy is to lead the MotoGP,sports
7/28/2021,Minister have opt to take strict decisions The election nominations are not given to Mr XYS, politics
So as per the data above what I want to save data into .csv file based on the category. Which means I want to store all sports related data(category name sports) in sports.csv file as below wrt above dataset
created_at,tweet,category
7/25/2021,Great Sunny day for Cricket at London Great Score put on by England batting Olympic is to held in Japan,sports
7/26/2021,Srilanka have lost the T20 series Australia have won the series,sports
7/27/2021,Rainy day for Cricket at London poor Score put on by zimabwe batting next Olympic is to held in Srilanka,sports
7/28/2021,IPL will be happening next year Velentino Rossy is to lead the MotoGP,sports
In the similar way I want to store politics related data in politics.csv file that it include following data wrt above data
created_at,tweet,category
7/25/2021,President Made a clear statement An election is to be kept next year,politics
7/26/2021,Minister have given up his role last monday President is challenging the opposite leader,politics
7/27/2021,President Made a poor statement No any election is to be kept next year,politics
7/28/2021,Minister have opt to take strict decisions The election nominations are not given to Mr XYS, politics
And in the same way for other fields as well. It would be very helpful if someone can help with this
CodePudding user response:
You can try something like this:
import pandas as pd
df = pd.read_csv('your_file.csv', sep=',')
cats = df['category'].unique()
for cat in cats:
df.loc[df['category']==cat,:].to_csv(cat '.csv',sep=',', index=False)
CodePudding user response:
Here you go:
import pandas as pd
df = pd.read_csv('./data.csv');
categories = df['category'].unique()
for category in categories:
df[df['category']== category].to_csv(category '.csv')
CodePudding user response:
Here is an approach. Not recommended for very large files, as the below approach involves recording every record to the corresponding file, rather than accumulating in a data structure and flushing it to the respective file.
Following is just a sample
awk -F"," 'NR > 1 { print $NF, $0 }' inputfile | while read category record
do
echo $record >> $category
done