Home > Enterprise >  Pandas: Merge pairs of rows within the same dataframe, while keeping both rows
Pandas: Merge pairs of rows within the same dataframe, while keeping both rows

Time:12-08

I have a table of executed trades, where each trade is recorded twice, reported from each side of the trade, so that each trade is represented by two rows in the table. For some of the trades here are discrepancies between the two rows, so that the amounts payed and received don't always match. For example maybe one party reports that they payed 100 and received 60, while the counterparty reports that they received 55 and payed 90. So I need to combine the rows to find the differences between the reported numbers (where there are discrepancies).

Counterparty A Counterparty B Amount payed Amount received Traders (alphabetical)
John Mark 100 60 John, Mark
Mark John 65 90 John, Mark
Philip Isaac 110 20 Isaac, Philip
Isaac Philip 20 100 Isaac, Philip

For each row, I would like to add the cells from the opposite side of the trade, so that the resulting table looks something like this:

Counterparty A_x Counterparty B_x Amount payed_x Amount received_x Counterparty A_y Counterparty B_y Amount payed_y Amount received_y
John Mark 100 60 Mark John 65 90
Mark John 65 90 John Mark 100 60
Philip Isaac 110 20 Isaac Philip 20 100
Isaac Philip 20 100 Philip Isaac 110 20

This seems like something that should be pretty straight-forward, but I somehow can't get it to work. I have tried a whole lot of groupbys and splits and merges. For example creating one df keeping the rows with the first occurence of the shared "Traders" value and one df keeping the rows with the second occurence of the shared "Traders". And then I tried to merge them onto their "opposite" row in the full dataframe. But didn't manage to get anywhere.

Are anyone able to help?

Edit: Forgot to mention the discrepancies between the reported amounts

CodePudding user response:

Just define two new columns with the criteria you want. It looks like you want something like this

df['Counterparty A_y'] = df['Counterparty B']
df['Counterparty B_y'] = df['Counterparty A']
df['Amount payed_y'] = df['Amount received']
df['Amount recieved_y'] = df['Amount payed']

This will keep your old columns and add the new ones. You could then use df.rename to rename your original columns so they have x's, as you have in your desired result.

CodePudding user response:

There must be pairs of rows for each trade. These may well be adjacent rows; if not then there must be some column with a tradeID which can be used to sort the rows into adjacent pairs. Then the solution is simply a matter of producing a new data frame with each of the row pairs swapped and combining these into one side-by-side DateFrame. The code below shows the principle - you can modify to suit, re-arrange and re-name columns to produce your required output. I don't think there is a vectorized way to do this so looping is the only way even though it will take some time with a large data size.

df= pd.DataFrame({'a':[1, 3, 5, 7, 9, 11], 'b': [2, 4, 6, 8, 10, 12]})

num = len(df)-1

df_flip=df.copy()
df_flip.columns = ['c', 'd']

for idx in range(0, num, 2):
    df_flip.iloc[idx,:],df_flip.iloc[idx 1,:]=df.iloc[idx 1,:],df.iloc[idx,:]

combi = pd.concat([df, df_flip], axis = 1)

which produces the following with the a, b columns swapped in pairs to c, d:

    a   b   c   d
0   1   2   3   4
1   3   4   1   2
2   5   6   7   8
3   7   8   5   6
4   9  10  11  12
5  11  12   9  10
  • Related