Home > Net >  How to map to a dataframe with a multiindex&
How to map to a dataframe with a multiindex&

Time:04-03

Well, there is a test dataset. I can group by some column, and after that add results as a new columns using .map - it's not a problem. But what I need is to group by two columns, and after that I want to add results to df - and it's not working. For example, for 2 Audi with 5 year in last column there should be 111000 (which we gathered from summing both of them) for two entries, and for 8 year old there should be one unchanged value. Will be glad if you can help me.

dff = pd.read_csv('https://raw.githubusercontent.com/codebasics/py/master/ML/5_one_hot_encoding/Exercise/carprices.csv')
dff

group_1 = dff.groupby('Car Model').sum().Mileage
dff['group_1'] = dff['Car Model'].map(group_1)
dff # it's working

group_2 = dff.groupby(['Car Model', 'Age(yrs)']).sum().Mileage
dff['group_2'] = dff['Car Model'].map(group_2)
dff # it's not working

CodePudding user response:

groupby() can be used with multiindex.

The agg method allows you to apply more functions to different columns inside groups:

test_a=dff.groupby(['Car Model','Age(yrs)'])['Mileage'].sum()

#with agg and numpy
test_b=dff.groupby(['Car Model','Age(yrs)']).agg({'Mileage':np.sum})

#agg with more functions

test_c=dff.groupby(['Car Model','Age(yrs)']).agg({'Mileage':(np.size,np.sum),'Sell Price($)':(np.min,np.max,np.mean)}) 

# giving me the size of group, and some statistics about prices
  • Related