I want to find, in a given column "type"
, the values of that column that repeats "n"
times.
I did this:
n = 5
df = dataf["type"].value_counts() > 5
print(df)
will return something like this:
Bike True
Truck True
Car False
How to get the values "Bike" and "Car" ? I want to add them in a set.
CodePudding user response:
You can use lambda
in a loc
for this:
import pandas as pd
df = pd.DataFrame({"vehicle": ["bike"] * 7 ["truck"] * 8 ["car"] * 4})
print(df)
print("\nUsing loc...")
print(df["vehicle"].value_counts().loc[lambda x: x > 5])
gives
vehicle
0 bike
1 bike
2 bike
3 bike
4 bike
5 bike
6 bike
7 truck
8 truck
9 truck
10 truck
11 truck
12 truck
13 truck
14 truck
15 car
16 car
17 car
18 car
Using loc...
truck 8
bike 7
Name: vehicle, dtype: int64
CodePudding user response:
Try this
aux = dataf["type"].value_counts()
greater_than_five = aux[aux > 5]
The first line get the count of the types and the second line filter for the types that is greater than five.
CodePudding user response:
Try this,
n = 5
df = dataf["type"].value_counts()[dataf["type"].value_counts() > n]
print(df)
CodePudding user response:
the most efficient way is with lambda that @user1717828 wrote it. another way :
df = pd.DataFrame({"vehicle": ["bike"] * 7 ["truck"] * 8 ["car"] * 4})
df2 = df["vehicle"].agg({'count':'value_counts'})
df2[df2['count'] > 5]
CodePudding user response:
You can add a new columns called counter which contain '1':
df['counter'] = 1
and use groupby:
df = df.groupby(['types']).sum()
df = df[df.counter > n]