Count per category on seperate columns - Panda Python-CodePudding

I have a panda dataframe with 3 columns:

Brand     Model          car_age

PEUGEOT   207            4. 6-8
BMW       3ER REIHE      2. 1-2
FIAT      FIAT DOBLO     3. 3-5
PEUGEOT   207            1. 0
BMW       3ER REIHE      2. 1-2
PEUGEOT   308            2. 1-2
BMW       520D           2. 1-2
...       ...            ...

And I want to group by Brand and Model and calculate the count per car_age category:

Brand    Model      "1. 0" "2. 1-2" "3. 3-5" "4. 6-8"

PEUGEOT  207         1        0        0        1
PEUGEOT  308         0        1        0        0
BMW      3ER REIHE   0        2        0        0
BMW      520D        0        1        0        0
FIAT     FIAT DOBLO  0        0        1        0

PS: 1. 0 means category one that corresponds to car age of zero. 2. 1-2 means category two that corresponds to car ages between 1-2. I enumerate my categories so they appear in the correct order.

I tried that:

output_count = pd.DataFrame({'Count':df.groupby('Brand','Model','car_age').size()})

but it dropped an error:

ValueError: No axis named Model for object type <class 'pandas.core.frame.DataFrame'>

Could anyone help me with this issue?

I think I provided enough information, but let me know if I can provide more.

CodePudding user response：

Use pd.crosstab:

pd.crosstab([df['Brand'], df['Model']], df['car_age']).reset_index()

Output:

car_age    Brand       Model  1. 0  2. 1-2  3. 3-5  4. 6-8
0            BMW   3ER REIHE     0       2       0       0
1            BMW        520D     0       1       0       0
2           FIAT  FIAT DOBLO     0       0       1       0
3        PEUGEOT         207     1       0       0       1
4        PEUGEOT         308     0       1       0       0

CodePudding user response：

The correct way to group by a data-frame with multiple columns is to use square brace around the columns name

df.groupby(['Brand','Model','car_age'])

I hope it will help you to solve your problem.

CodePudding user response：

Here is a function you can call. If you want to see how it works granularly

def group_by(df):
    data_dumm = pd.get_dummies(df['car_age'])
    data =df.drop(columns='car_age')
    X= pd.concat([data,data_dumm], axis=1).groupby(['Brand','Model']).sum()
    return X.reset_index()
group_by(df)

output:

    Brand   Model       1. 0    2. 1-2  3. 3-5  4. 6-8
0   BMW     3ER REIHE   0       2       0       0
1   BMW     520D        0       1       0       0
2   FIAT    FIAT DOBLO  0       0       1       0
3   PEUGEOT 207         1       0       0       1
4   PEUGEOT 308         0       1       0       0