Initially my data looked like this:
date str
2020-01-11 17:16:09 00:00 aa
2020-01-11 17:16:09 00:00 a
2020-01-11 17:16:09 00:00 a
2020-01-19 17:16:09 00:00 bb
2020-01-19 17:16:09 00:00 bb
2020-01-19 17:16:09 00:00 b
I've obtained my df by using:
df = df.groupby('date', as_index=False)['str'].apply(' '.join)
The df looks like this now:
date str
2020-01-11 ["aa a a"]
2020-01-19 ["bb bb b"]
I'd like to have different files with the date as filename and inside just a row with all the text as a single phrase. In the end the file should looks like this:
2020-01-11.csv
aa a a
2020-01-19.csv
bb bb b
I've tried using this code:
df = df.groupby('date', as_index=False)['str'].apply(' '.join)
df = pd.DataFrame(df)
for index, row in df.iterrows():
date = str(row['date']).replace('-', '_')
filename = f'{date}.csv'
with open(filename, 'w', encoding="utf-8-sig") as file:
file.write(row['str'])
But I'm getting this insterad:
2020-01-11.csv
aa
a
a
2020-01-19.csv
bb
bb
b
CodePudding user response:
Directly use your groupby
object for the loop:
for date, group in df.groupby('date', as_index=False):
filename = f"{date.replace('-', '_')}.csv"
with open(filename, 'w', encoding="utf-8-sig") as f:
f.write(' '.join(group['str'].astype(str)))
Output files:
2020_01_11.csv
:
aa a a
2020_01_19.csv
:
bb bb b
Used input for df
:
date str
0 2020-01-11 aa
1 2020-01-11 a
2 2020-01-11 a
3 2020-01-19 bb
4 2020-01-19 bb
5 2020-01-19 b
CodePudding user response:
You can try:
# You can replace by df['date'].dt.date if df['date'] is already a DatetimeIndex
out = df.groupby(pd.to_datetime(df['date']).dt.date)['str'].agg(' '.join)
for idx, val in out.items():
with open(f"{idx.strftime('%Y_%m_%d')}.csv", 'w', encoding='utf-8-sig') as fp:
print(val, file=fp)
Output 2020_01_11.csv
:
aa a a
Output 2020_01_19.csv
:
bb bb b
Output dataframe:
>>> out
date
2020-01-11 aa a a
2020-01-19 bb bb b
Name: str, dtype: object
Input dataframe:
>>> df
date str
0 2020-01-11 17:16:09 00:00 aa
1 2020-01-11 17:16:09 00:00 a
2 2020-01-11 17:16:09 00:00 a
3 2020-01-19 17:16:09 00:00 bb
4 2020-01-19 17:16:09 00:00 bb
5 2020-01-19 17:16:09 00:00 b