Home > OS >  How to save data into multiple CSV files based on column specific values
How to save data into multiple CSV files based on column specific values

Time:03-19

I want to save data, based on column values. Below is the data set that I'm working on

created_at,tweet,category
7/25/2021,Great Sunny day for Cricket at London Great Score put on by England batting Olympic is to held in Japan,sports
7/25/2021,President Made a clear statement An election is to be kept next year,politics
7/25/2021,A terrorist attack have killed 10 people,crime
7/26/2021,Srilanka have lost the T20 series Australia have won the series,sports
7/26/2021,Minister have given up his role last monday President is challenging the opposite leader,politics
7/27/2021,Rainy day for Cricket at London poor Score put on by zimabwe batting next Olympic is to held in Srilanka,sports
7/27/2021,President Made a poor statement No any election is to be kept next year,politics
7/27/2021,100 of people are being killed due to terror attack,crime
7/28/2021,IPL will be happening next year Velentino Rossy is to lead the MotoGP,sports
7/28/2021,Minister have opt to take strict decisions The election nominations are not given to Mr XYS, politics

So as per the data above what I want to save data into .csv file based on the category. Which means I want to store all sports related data(category name sports) in sports.csv file as below wrt above dataset

created_at,tweet,category
7/25/2021,Great Sunny day for Cricket at London Great Score put on by England batting Olympic is to held in Japan,sports
7/26/2021,Srilanka have lost the T20 series Australia have won the series,sports
7/27/2021,Rainy day for Cricket at London poor Score put on by zimabwe batting next Olympic is to held in Srilanka,sports
7/28/2021,IPL will be happening next year Velentino Rossy is to lead the MotoGP,sports

In the similar way I want to store politics related data in politics.csv file that it include following data wrt above data

created_at,tweet,category
7/25/2021,President Made a clear statement An election is to be kept next year,politics
7/26/2021,Minister have given up his role last monday President is challenging the opposite leader,politics
7/27/2021,President Made a poor statement No any election is to be kept next year,politics
7/28/2021,Minister have opt to take strict decisions The election nominations are not given to Mr XYS, politics

And in the same way for other fields as well. It would be very helpful if someone can help with this

CodePudding user response:

You can try something like this:

import pandas as pd
df = pd.read_csv('your_file.csv', sep=',')

cats = df['category'].unique()
for cat in cats:
    df.loc[df['category']==cat,:].to_csv(cat '.csv',sep=',', index=False)

CodePudding user response:

Here you go:

import pandas as pd
df = pd.read_csv('./data.csv');
categories = df['category'].unique()

for category in categories:
  df[df['category']== category].to_csv(category   '.csv')

CodePudding user response:

Here is an approach. Not recommended for very large files, as the below approach involves recording every record to the corresponding file, rather than accumulating in a data structure and flushing it to the respective file.

Following is just a sample

awk -F"," 'NR > 1 { print $NF, $0 }' inputfile | while read category record
do
    echo $record >> $category
done
  • Related