create multiple csv/excel files based on column value after operation with dataframe-CodePudding

My dataframe example (over 35k rows):

stop_id                      time
7909    2022-04-06T03:47:00 03:00
7909    2022-04-06T04:07:00 03:00
1009413 2022-04-06T04:10:00 03:00
1002246 2022-04-06T04:19:00 03:00
1009896 2022-04-06T04:20:00 03:00

I want to conduct some operations on this dataframe, and then split the dataframe based on the value stop_id. So, assuming there are 50 unique stop_id values, I want to get 50 separate csv/excel files containing data with one unique stop_id. How can I do this?

CodePudding user response：

Using group by

# group by 'stop_id' column
groups = df.groupby("stop_id")

And then iterating over the groups (named to the stop_id of the group using an f-string)

for name, group in groups:
    #logic to write to files
    group.to_csv(f'{name}.csv')

CodePudding user response：

I used the groupby and first method here.

import pandas as pd
df = pd.DataFrame({"stop_id" : [7909, 7909, 1009413, 1002246,1009896],
                   "time":["2022-04-06T03:47:00 03:00", "2022-04-06T04:10:00 03:00",
                           "2022-04-06T04:07:00 03:00","2022-04-06T04:19:00 03:00","2022-04-06T04:20:00 03:00"]})
df = df.groupby("stop_id")
df = df.first().reset_index()
print(df)
print(df)
for idx, item in enumerate(df["stop_id"]):
    df_inner = pd.DataFrame({item})
    df_inner.to_csv(f'{df["time"].values[idx]}.csv', index=False)

   stop_id                       time
0     7909  2022-04-06T03:47:00 03:00
1  1002246  2022-04-06T04:19:00 03:00
2  1009413  2022-04-06T04:07:00 03:00
3  1009896  2022-04-06T04:20:00 03:00