Add a column based on frequency for each group-CodePudding

I have a dataframe like this:

        vehicle_id   trip
0            0        0
1            0        0
2            0        0
3            0        1
4            0        1
5            1        0
6            1        0
7            1        1
8            1        1
9            1        1
10           1        1
11           1        1
12           1        2
13           2        0
14           2        1
15           2        2

I want to add a column that counts the frequency of each trip value for each 'vehicle id' group and drop the rows where the frequency is equal to 'one'. So after adding the column the frequency will be like this:

        vehicle_id   trip  frequency
0            0        0      3
1            0        0      3
2            0        0      3
3            0        1      2
4            0        1      2
5            1        0      2
6            1        0      2
7            1        1      5
8            1        1      5
9            1        1      5
10           1        1      5
11           1        1      5
12           1        2      1
13           2        0      1
14           2        1      1
15           2        2      1

and the final result will be like this

        vehicle_id   trip  frequency
0            0        0      3
1            0        0      3
2            0        0      3
3            0        1      2
4            0        1      2
5            1        0      2
6            1        0      2
7            1        1      5
8            1        1      5
9            1        1      5
10           1        1      5
11           1        1      5

what is the best solution for that? Also, what should I do if I intend to directly drop rows where the frequency is equal to 1 in each group (without adding the frequency column)?

CodePudding user response：

Check the collab here : https://colab.research.google.com/drive/1AuBTuW7vWj1FbJzhPuE-QoLncoF5W_7W?usp=sharing

You can use df.groupby() :

df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")

But of course you need to create the frequency column before_hand :

df["frequency"] = 0

If I take your dataframe as example this gives :

import pandas as pd


dict = {"vehicle_id" : [0,0,0,0,0,1,1,1,1,1,1,1],
        "trip" : [0,0,0,1,1,0,0,1,1,1,1,1]}

df = pd.DataFrame.from_dict(dict)

df["frequency"] = 0

df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")

output :

CodePudding user response：

Try:

df["frequency"] = (
    df.assign(frequency=0).groupby(["vehicle_id", "trip"]).transform("count")
)
print(df[df.frequency > 1])

Prints:

    vehicle_id  trip  frequency
0            0     0          3
1            0     0          3
2            0     0          3
3            0     1          2
4            0     1          2
5            1     0          2
6            1     0          2
7            1     1          5
8            1     1          5
9            1     1          5
10           1     1          5
11           1     1          5