Well, there is a test dataset. I can group by some column, and after that add results as a new columns using .map - it's not a problem. But what I need is to group by two columns, and after that I want to add results to df - and it's not working. For example, for 2 Audi with 5 year in last column there should be 111000 (which we gathered from summing both of them) for two entries, and for 8 year old there should be one unchanged value. Will be glad if you can help me.
dff = pd.read_csv('https://raw.githubusercontent.com/codebasics/py/master/ML/5_one_hot_encoding/Exercise/carprices.csv')
dff
group_1 = dff.groupby('Car Model').sum().Mileage
dff['group_1'] = dff['Car Model'].map(group_1)
dff # it's working
group_2 = dff.groupby(['Car Model', 'Age(yrs)']).sum().Mileage
dff['group_2'] = dff['Car Model'].map(group_2)
dff # it's not working
CodePudding user response:
groupby()
can be used with multiindex.
The agg
method allows you to apply more functions to different columns inside groups:
test_a=dff.groupby(['Car Model','Age(yrs)'])['Mileage'].sum()
#with agg and numpy
test_b=dff.groupby(['Car Model','Age(yrs)']).agg({'Mileage':np.sum})
#agg with more functions
test_c=dff.groupby(['Car Model','Age(yrs)']).agg({'Mileage':(np.size,np.sum),'Sell Price($)':(np.min,np.max,np.mean)})
# giving me the size of group, and some statistics about prices