I have a panda dataframe with 3 columns:
Brand Model car_age
PEUGEOT 207 4. 6-8
BMW 3ER REIHE 2. 1-2
FIAT FIAT DOBLO 3. 3-5
PEUGEOT 207 1. 0
BMW 3ER REIHE 2. 1-2
PEUGEOT 308 2. 1-2
BMW 520D 2. 1-2
... ... ...
And I want to group by Brand and Model and calculate the count per car_age
category:
Brand Model "1. 0" "2. 1-2" "3. 3-5" "4. 6-8"
PEUGEOT 207 1 0 0 1
PEUGEOT 308 0 1 0 0
BMW 3ER REIHE 0 2 0 0
BMW 520D 0 1 0 0
FIAT FIAT DOBLO 0 0 1 0
PS: 1. 0
means category one that corresponds to car age of zero. 2. 1-2
means category two that corresponds to car ages between 1-2. I enumerate my categories so they appear in the correct order.
I tried that:
output_count = pd.DataFrame({'Count':df.groupby('Brand','Model','car_age').size()})
but it dropped an error:
ValueError: No axis named Model for object type <class 'pandas.core.frame.DataFrame'>
Could anyone help me with this issue?
I think I provided enough information, but let me know if I can provide more.
CodePudding user response:
Use pd.crosstab
:
pd.crosstab([df['Brand'], df['Model']], df['car_age']).reset_index()
Output:
car_age Brand Model 1. 0 2. 1-2 3. 3-5 4. 6-8
0 BMW 3ER REIHE 0 2 0 0
1 BMW 520D 0 1 0 0
2 FIAT FIAT DOBLO 0 0 1 0
3 PEUGEOT 207 1 0 0 1
4 PEUGEOT 308 0 1 0 0
CodePudding user response:
The correct way to group by a data-frame with multiple columns is to use square brace around the columns name
df.groupby(['Brand','Model','car_age'])
I hope it will help you to solve your problem.
CodePudding user response:
Here is a function you can call. If you want to see how it works granularly
def group_by(df):
data_dumm = pd.get_dummies(df['car_age'])
data =df.drop(columns='car_age')
X= pd.concat([data,data_dumm], axis=1).groupby(['Brand','Model']).sum()
return X.reset_index()
group_by(df)
output:
Brand Model 1. 0 2. 1-2 3. 3-5 4. 6-8
0 BMW 3ER REIHE 0 2 0 0
1 BMW 520D 0 1 0 0
2 FIAT FIAT DOBLO 0 0 1 0
3 PEUGEOT 207 1 0 0 1
4 PEUGEOT 308 0 1 0 0