I am having the following dataframe
data = [
[1000, 1, 1], [1000, 1, 1], [1000, 1, 1], [1000, 1, 2], [1000, 1, 2],
[1000, 1, 2], [2000, 0, 1], [2000, 0, 1], [2000, 1, 2],
[2000, 0, 2], [2000, 1, 2]]
df = pd.DataFrame(data, columns=['route_id', 'direction_id', 'trip_id'])
Then, I group my df
based on the columns route_id
, direction_id
by using:
t_groups = df.groupby(['route_id','direction_id'])
I would like to store the value of the trip_id
column based on the first most popular trip_id
of each unique route_id
, direction_id
combination.
Ι have tried to apply a function value_counts()
but I cannot get the first popular trip_id
value.
I would like my expected output to be like:
route_id direction_id trip_id
0 1000 1 1
1 2000 0 1
2 2000 1 2
Any suggestions?
CodePudding user response:
To store the value of the trip_id column based on the first most popular trip_id of each unique route_id, direction_id combination, you can use the idxmax method on the groupby object to get the index of the first most popular trip_id, and then use this index to access the value of the trip_id column.
Here is an example of how you can do this:
import pandas as pd
# Create the dataframe
data = [[1000, 1, 1], [1000, 1, 1], [1000, 1, 1], [1000, 1, 2], [1000, 1, 2], [1000, 1, 2], [2000, 0, 1], [2000, 0, 1], [2000, 1, 2], [2000, 0, 2], [2000, 1, 2]]
df = pd.DataFrame(data, columns=['route_id', 'direction_id', 'trip_id'])
# Group the dataframe by route_id and direction_id
t_groups = df.groupby(['route_id','direction_id'])
# Get the index of the first most popular trip_id for each group
idx = t_groups['trip_id'].apply(lambda x: x.value_counts().index[0])
# Access the value of the trip_id column at the index for each group
trip_ids = t_groups['trip_id'].apply(lambda x: x.loc[idx])
# Print the values of the trip_id column for each group
print(trip_ids)
CodePudding user response:
This is what you are looking for.
df = df.groupby(['route_id', 'direction_id']).first().reset_index()
The reset_index()
just moves your indices into columns looking exactly like the output you want.