My dataframe example (over 35k rows):
stop_id time
7909 2022-04-06T03:47:00 03:00
7909 2022-04-06T04:07:00 03:00
1009413 2022-04-06T04:10:00 03:00
1002246 2022-04-06T04:19:00 03:00
1009896 2022-04-06T04:20:00 03:00
I want to conduct some operations on this dataframe, and then split the dataframe based on the value stop_id
. So, assuming there are 50 unique stop_id
values, I want to get 50 separate csv/excel files containing data with one unique stop_id. How can I do this?
CodePudding user response:
Using group by
# group by 'stop_id' column
groups = df.groupby("stop_id")
And then iterating over the groups (named to the stop_id of the group using an f-string)
for name, group in groups:
#logic to write to files
group.to_csv(f'{name}.csv')
CodePudding user response:
I used the groupby and first method here.
import pandas as pd
df = pd.DataFrame({"stop_id" : [7909, 7909, 1009413, 1002246,1009896],
"time":["2022-04-06T03:47:00 03:00", "2022-04-06T04:10:00 03:00",
"2022-04-06T04:07:00 03:00","2022-04-06T04:19:00 03:00","2022-04-06T04:20:00 03:00"]})
df = df.groupby("stop_id")
df = df.first().reset_index()
print(df)
print(df)
for idx, item in enumerate(df["stop_id"]):
df_inner = pd.DataFrame({item})
df_inner.to_csv(f'{df["time"].values[idx]}.csv', index=False)
stop_id time
0 7909 2022-04-06T03:47:00 03:00
1 1002246 2022-04-06T04:19:00 03:00
2 1009413 2022-04-06T04:07:00 03:00
3 1009896 2022-04-06T04:20:00 03:00