Pandas: group row by date, join in a single text all a column str for each day and then save them to-CodePudding

Initially my data looked like this:

date                          str
2020-01-11 17:16:09 00:00     aa
2020-01-11 17:16:09 00:00     a  
2020-01-11 17:16:09 00:00     a    
2020-01-19 17:16:09 00:00     bb       
2020-01-19 17:16:09 00:00     bb   
2020-01-19 17:16:09 00:00     b

I've obtained my df by using:

df = df.groupby('date', as_index=False)['str'].apply(' '.join)

The df looks like this now:


date            str 
2020-01-11     ["aa a a"]
2020-01-19     ["bb bb b"]

I'd like to have different files with the date as filename and inside just a row with all the text as a single phrase. In the end the file should looks like this:

2020-01-11.csv
aa a a

2020-01-19.csv
bb bb b

I've tried using this code:

df = df.groupby('date', as_index=False)['str'].apply(' '.join)

df = pd.DataFrame(df)

for index, row in df.iterrows():    
    date = str(row['date']).replace('-', '_')   
    filename = f'{date}.csv'
    with open(filename, 'w', encoding="utf-8-sig") as file:
        file.write(row['str'])

But I'm getting this insterad:

2020-01-11.csv
aa 
a 
a

2020-01-19.csv
bb 
bb 
b

CodePudding user response：

Directly use your groupby object for the loop:

for date, group in df.groupby('date', as_index=False):
    filename = f"{date.replace('-', '_')}.csv"
    with open(filename, 'w', encoding="utf-8-sig") as f:
        f.write(' '.join(group['str'].astype(str)))

Output files:

2020_01_11.csv:

aa a a

2020_01_19.csv:

bb bb b

Used input for df:

         date str
0  2020-01-11  aa
1  2020-01-11   a
2  2020-01-11   a
3  2020-01-19  bb
4  2020-01-19  bb
5  2020-01-19   b

CodePudding user response：

You can try:

# You can replace by df['date'].dt.date if df['date'] is already a DatetimeIndex
out = df.groupby(pd.to_datetime(df['date']).dt.date)['str'].agg(' '.join)
for idx, val in out.items():
    with open(f"{idx.strftime('%Y_%m_%d')}.csv", 'w', encoding='utf-8-sig') as fp:
        print(val, file=fp)

Output 2020_01_11.csv:

aa a a

Output 2020_01_19.csv:

bb bb b

Output dataframe:

>>> out
date
2020-01-11     aa a a
2020-01-19    bb bb b
Name: str, dtype: object

Input dataframe:

>>> df
                        date str
0  2020-01-11 17:16:09 00:00  aa
1  2020-01-11 17:16:09 00:00   a
2  2020-01-11 17:16:09 00:00   a
3  2020-01-19 17:16:09 00:00  bb
4  2020-01-19 17:16:09 00:00  bb
5  2020-01-19 17:16:09 00:00   b