I have a dataframe df:
Source | Dest | Value |
---|---|---|
A | B | 10 |
A | B | 7 |
B | A | 6 |
A | C | 12 |
A | C | 5 |
D | E | 5 |
E | D | 6 |
F | G | 1 |
I want to get the 'directional sum' of the values for the (source, dest) tuple. (i.e. combine all the X->Y and Y->X entries and take the weighted sum for the value column with weights equal to 1 and -1 respectively)
df_out:
Source | Dest | Value |
---|---|---|
A | B | 11 |
A | C | 17 |
D | E | -1 |
F | G | 1 |
CodePudding user response:
We need np.sort
then can trim the value with the diff
df1 = df.copy()
l = ['Source','Dest']
df1[l] = np.sort(df1[l].values,axis=1)
df1.loc[df1[l].ne(df[l]).all(1),'Value'] *= (-1)
df1 = df1.groupby(l)['Value'].sum().reset_index()
df1
Out[83]:
Source Dest Value
0 A B 11
1 A C 17
2 D E -1
3 F G 1
CodePudding user response:
You could groupby
the sorted values, apply a -1 factor if the direction changed, and sum:
group = df[['Source', 'Dest']].apply(sorted, axis=1, result_type='expand')
(df.assign(Value=df['Value'].where(group[0].eq(df['Source']), -df['Value']))
.groupby([group[0], group[1]])
['Value'].sum()
.reset_index()
.rename(columns={0: 'Source', 1:'Value'})
)
output:
Source Value Value
0 A B 11
1 A C 17
2 D E -1
3 F G 1