I have a dataframe like this:
vehicle_id trip
0 0 0
1 0 0
2 0 0
3 0 1
4 0 1
5 1 0
6 1 0
7 1 1
8 1 1
9 1 1
10 1 1
11 1 1
12 1 2
13 2 0
14 2 1
15 2 2
I want to add a column that counts the frequency of each trip value for each 'vehicle id' group and drop the rows where the frequency is equal to 'one'. So after adding the column the frequency will be like this:
vehicle_id trip frequency
0 0 0 3
1 0 0 3
2 0 0 3
3 0 1 2
4 0 1 2
5 1 0 2
6 1 0 2
7 1 1 5
8 1 1 5
9 1 1 5
10 1 1 5
11 1 1 5
12 1 2 1
13 2 0 1
14 2 1 1
15 2 2 1
and the final result will be like this
vehicle_id trip frequency
0 0 0 3
1 0 0 3
2 0 0 3
3 0 1 2
4 0 1 2
5 1 0 2
6 1 0 2
7 1 1 5
8 1 1 5
9 1 1 5
10 1 1 5
11 1 1 5
what is the best solution for that? Also, what should I do if I intend to directly drop rows where the frequency is equal to 1 in each group (without adding the frequency column)?
CodePudding user response:
Check the collab here : https://colab.research.google.com/drive/1AuBTuW7vWj1FbJzhPuE-QoLncoF5W_7W?usp=sharing
You can use df.groupby() :
df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")
But of course you need to create the frequency column before_hand :
df["frequency"] = 0
If I take your dataframe as example this gives :
import pandas as pd
dict = {"vehicle_id" : [0,0,0,0,0,1,1,1,1,1,1,1],
"trip" : [0,0,0,1,1,0,0,1,1,1,1,1]}
df = pd.DataFrame.from_dict(dict)
df["frequency"] = 0
df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")
CodePudding user response:
Try:
df["frequency"] = (
df.assign(frequency=0).groupby(["vehicle_id", "trip"]).transform("count")
)
print(df[df.frequency > 1])
Prints:
vehicle_id trip frequency
0 0 0 3
1 0 0 3
2 0 0 3
3 0 1 2
4 0 1 2
5 1 0 2
6 1 0 2
7 1 1 5
8 1 1 5
9 1 1 5
10 1 1 5
11 1 1 5