I have an excel files that looks like this :
N_order | Amount |
---|---|
1137 | 258 |
1137 | -258 |
1137 | 258 |
1138 | 802 |
1139 | 983 |
1139 | 837 |
1139 | -837 |
1139 | -248 |
I want to do pair with negative and positive value, but only pair, as an example I want them to be linked like that :
N_order | Amount | Link |
---|---|---|
1137 | 258 | linked |
1137 | -258 | linked |
1137 | 258 | |
1138 | 802 | |
1139 | 983 | |
1139 | 837 | linked |
1139 | -837 | linked |
1139 | -248 |
how can I do it ? I tried this code :
df['link'] = ""
for i in range(len(df)):
commande = df.at[i, 'N_order']
montant = df.at[i, 'Amount']
if montant > 0:
match = df.loc[(df['N_order'] == commande) & (df['Amount'] == -montant)]
if not match.empty:
df.at[i, 'link'] = commande
df.at[match.index[0], 'link'] = commande
df = df[(df['N_order'] != '') & (df['link'] != '') & (df['Amount'] != '')][['N_order', 'Amount', 'link']]
df.to_excel("fileoutput.xlsx", index=False)
df
but it makes my table looks like that :
N_order | Amount | Link |
---|---|---|
1137 | 258 | linked |
1137 | -258 | linked |
1137 | 258 | linked |
1138 | 802 | |
1139 | 983 | |
1139 | 837 | linked |
1139 | -837 | linked |
1139 | -248 |
(there is 1750 rows, so I can't just hand correct it.)
How can I do it ? Thanks you !
CodePudding user response:
De-duplicate (with groupby.cumcount
) and pivot
the absolute values and sign
to identify the matching indices, then assign the comment:
import numpy as np
idx = (df
.assign(n=df.groupby(['N_order', 'Amount']).cumcount(),
sign=np.sign(df['Amount']),
abs=df['Amount'].abs()
)
.reset_index()
.pivot(index=['N_order', 'n', 'abs'], columns='sign', values='index')
.reindex([-1, 1], axis=1)
.dropna().to_numpy().ravel()
)
df.loc[idx, 'Link'] = 'linked'
Output:
N_order Amount Link
0 1137 258 linked
1 1137 -258 linked
2 1137 258 NaN
3 1138 802 NaN
4 1139 983 NaN
5 1139 837 linked
6 1139 -837 linked
7 1139 -248 NaN
Intermediate pivot of the indices:
sign -1 1
N_order n abs
1137 0 258 1.0 0.0
1 258 NaN 2.0
1138 0 802 NaN 3.0
1139 0 248 7.0 NaN
837 6.0 5.0
983 NaN 4.0
bonus: matching the paired value
With a minor modification you can also match the index of the paired values:
df2 = (df.assign(n=df.groupby(['N_order', 'Amount']).cumcount(),
sign=np.sign(df['Amount']),
abs=df['Amount'].abs()
)
.reset_index()
.pivot(index=['N_order', 'n', 'abs'], columns='sign', values='index')
.reindex([-1, 1], axis=1)
.dropna().reset_index(drop=True)
)
df['matching_index'] = pd.concat([df2.set_index(-1)[1], df2.set_index(1)[-1]])
Output:
N_order Amount matching_index
0 1137 258 1.0
1 1137 -258 0.0
2 1137 258 NaN
3 1138 802 NaN
4 1139 983 NaN
5 1139 837 6.0
6 1139 -837 5.0
7 1139 -248 NaN
CodePudding user response:
Here is another way using rolling()
to find the values that add to 0 to locate the linked pair.
a = df.groupby('N_order')['Amount'].rolling(2,min_periods = 1).sum()
s = a.droplevel(0).eq(0) & df['Amount'].lt(0)
df['Link'] = np.where(s.where(s).bfill(limit=1).fillna(False),'linked','')
Output:
N_order Amount Link
0 1137 258 linked
1 1137 -258 linked
2 1137 258
3 1138 802
4 1139 983
5 1139 837 linked
6 1139 -837 linked
7 1139 -248