say I have the following df:
Location | Class | Value | Type |
---|---|---|---|
1 | A1 | 2 | WD |
1 | A2 | 4 | WD |
1 | B | 10 | WD |
1 | A1 | 4 | WE |
1 | A2 | 6 | WE |
1 | B | 20 | WE |
2 | A1 | 4 | WD |
2 | A2 | 6 | WD |
2 | B | 15 | WD |
2 | A1 | 6 | WE |
2 | A2 | 8 | WE |
2 | B | 25 | WE |
and I want to cal the average of A1 and A2 at each location and for each type as follows:
Location | Class | Value | Type |
---|---|---|---|
1 | A12 | 3 | WD |
1 | B | 10 | WD |
1 | A12 | 5 | WE |
1 | B | 20 | WE |
2 | A12 | 5 | WD |
2 | B | 15 | WD |
2 | A12 | 7 | WE |
2 | B | 25 | WE |
I have solved this issue by making two different dfs and joining them but I was wondering if there is a more straightforward solution I can use
a_list=['A1','A2']
df1=df.loc[df['Class'].isin(a_list)]
df1=df1.groupby(['Location','Type'], as_index = False).agg({'Value':np.sum })
df2=df.loc[df['Class']=='B']
df3=[df1,df2]
df4=pd.concat(df3)
CodePudding user response:
This gives you what you are looking for:
df = pd.read_clipboard()
df.loc[df["Class"].isin(['A1', 'A2']), "Class"] = 'A12'
df.groupby(["Location", "Class", "Type"]).mean().reset_index()
CodePudding user response:
Where column Class
contains A, make it `A12', groupby and aggregate as required
df = df.assign(Class =np.where(df['Class'].str.contains('A'), 'A12', df['Class'])).groupby(['Location','Class','Type']).agg('mean').reset_index()
Location Class Type Value
0 1 A12 WD 3
1 1 A12 WE 5
2 1 B WD 10
3 1 B WE 20
4 2 A12 WD 5
5 2 A12 WE 7
6 2 B WD 15
7 2 B WE 25