How can I get pair of negative positive value from a .xlsx table using python-CodePudding

I have an excel files that looks like this :

N_order	Amount
1137	258
1137	-258
1137	258
1138	802
1139	983
1139	837
1139	-837
1139	-248

I want to do pair with negative and positive value, but only pair, as an example I want them to be linked like that :

N_order	Amount	Link
1137	258	linked
1137	-258	linked
1137	258
1138	802
1139	983
1139	837	linked
1139	-837	linked
1139	-248

how can I do it ? I tried this code :

df['link'] = ""
for i in range(len(df)):
    commande = df.at[i, 'N_order']
    montant = df.at[i, 'Amount']
    if montant > 0:
        match = df.loc[(df['N_order'] == commande) & (df['Amount'] == -montant)]
        if not match.empty:
            df.at[i, 'link'] = commande
            df.at[match.index[0], 'link'] = commande
df = df[(df['N_order'] != '') & (df['link'] != '') & (df['Amount'] != '')][['N_order', 'Amount', 'link']]
df.to_excel("fileoutput.xlsx", index=False)
df

but it makes my table looks like that :

N_order	Amount	Link
1137	258	linked
1137	-258	linked
1137	258	linked
1138	802
1139	983
1139	837	linked
1139	-837	linked
1139	-248

(there is 1750 rows, so I can't just hand correct it.)

How can I do it ? Thanks you !

CodePudding user response：

De-duplicate (with groupby.cumcount) and pivot the absolute values and sign to identify the matching indices, then assign the comment:

import numpy as np

idx = (df
   .assign(n=df.groupby(['N_order', 'Amount']).cumcount(),
           sign=np.sign(df['Amount']),
           abs=df['Amount'].abs()
          )
   .reset_index()
   .pivot(index=['N_order', 'n', 'abs'], columns='sign', values='index')
   .reindex([-1, 1], axis=1)
   .dropna().to_numpy().ravel()
)

df.loc[idx, 'Link'] = 'linked'

Output:

   N_order  Amount    Link
0     1137     258  linked
1     1137    -258  linked
2     1137     258     NaN
3     1138     802     NaN
4     1139     983     NaN
5     1139     837  linked
6     1139    -837  linked
7     1139    -248     NaN

Intermediate pivot of the indices:

sign            -1    1
N_order n abs          
1137    0 258  1.0  0.0
        1 258  NaN  2.0
1138    0 802  NaN  3.0
1139    0 248  7.0  NaN
          837  6.0  5.0
          983  NaN  4.0

bonus: matching the paired value

With a minor modification you can also match the index of the paired values:

df2 = (df.assign(n=df.groupby(['N_order', 'Amount']).cumcount(),
           sign=np.sign(df['Amount']),
           abs=df['Amount'].abs()
          )
   .reset_index()
   .pivot(index=['N_order', 'n', 'abs'], columns='sign', values='index')
   .reindex([-1, 1], axis=1)
   .dropna().reset_index(drop=True)
)

df['matching_index'] = pd.concat([df2.set_index(-1)[1], df2.set_index(1)[-1]])

Output:

   N_order  Amount  matching_index
0     1137     258             1.0
1     1137    -258             0.0
2     1137     258             NaN
3     1138     802             NaN
4     1139     983             NaN
5     1139     837             6.0
6     1139    -837             5.0
7     1139    -248             NaN

CodePudding user response：

Here is another way using rolling() to find the values that add to 0 to locate the linked pair.

a = df.groupby('N_order')['Amount'].rolling(2,min_periods = 1).sum()

s = a.droplevel(0).eq(0) & df['Amount'].lt(0)

df['Link'] = np.where(s.where(s).bfill(limit=1).fillna(False),'linked','')

Output:

   N_order  Amount    Link
0     1137     258  linked
1     1137    -258  linked
2     1137     258        
3     1138     802        
4     1139     983        
5     1139     837  linked
6     1139    -837  linked
7     1139    -248