Struggling with Pivot table in Python - collapsing rows and performing calculations-CodePudding

I'm wondering if anyone can point me in the right direction. I'm in over my head on a pivot table. I have a Pandas dataframe. I loaded two .csv files, dropped/renamed columns, and used concat to put them together. That all went fine. It's a list of transit stops, showing their latitude and longitude and how many times a bus stops at that stop on weekdays, Saturdays, and Sundays.

Here's what it looks like now:

Current Dataframe

stop_code service_id daily_trips stop_lat   stop_lon
99994 saturday 36    40.439562 -79.995332
99994 sunday 32 40.439562 -79.995332
99994 weekday    50      40.439562 -79.995332
99994    saturday 71 40.439562 -79.995332
99994   sunday   65 40.439562 -79.995332
99994   weekday 89 40.439562 -79.995332

Etc. for 25,000 or so different stop_codes. I would like to make rows that each represent one stop code (and its latitude and longitude), along with the total number of weekday, Saturday, and Sunday trips for that stop. So, it would look like:

stop_code daily_wkdy_trips daily_sat_trips daily_sun_trips stop_lat stop_lon
99994 183 141 130 40.439562 -79.995332

A simpler illustration of what I want to do is here:
New Dataframe

So, is this something I should I use a pivot table to do? I can't figure out how to do all this at once:

Collapse the data to one row for each stop_code (and its latitude and longitude).
Add the values for each category of service_id together.
Create new columns for those values.

CodePudding user response：

You can use .groupby() followed by .agg:

x = df.groupby("stop_code").agg(
    stop_lat=("stop_lat", "first"),
    stop_lon=("stop_lon", "first"),
    daily_wkdy_trips=(
        "daily_trips",
        lambda x: x[df.loc[x.index, "service_id"].eq("weekday")].sum(),
    ),
    daily_sat_trips=(
        "daily_trips",
        lambda x: x[df.loc[x.index, "service_id"].eq("saturday")].sum(),
    ),
    daily_sun_trips=(
        "daily_trips",
        lambda x: x[df.loc[x.index, "service_id"].eq("sunday")].sum(),
    ),
)
print(x)

Prints:

            stop_lat   stop_lon  daily_wkdy_trips  daily_sat_trips  daily_sun_trips
stop_code                                                                          
99994      40.439562 -79.995332               183              141              130

CodePudding user response：

I would suggest separating 2 different tasks and then joining the results:

pivot = pd.pivot_table(
    data=df,
    values='daily_trips',
    index='stop_code',
    columns='service_id',
    aggfunc='sum'
)
grouped = df.groupby('stop_code')['stop_lat', 'stop_lon'].first()
result = pivot.join(grouped)