Pandas: Merge pairs of rows within the same dataframe, while keeping both rows-CodePudding

I have a table of executed trades, where each trade is recorded twice, reported from each side of the trade, so that each trade is represented by two rows in the table. For some of the trades here are discrepancies between the two rows, so that the amounts payed and received don't always match. For example maybe one party reports that they payed 100 and received 60, while the counterparty reports that they received 55 and payed 90. So I need to combine the rows to find the differences between the reported numbers (where there are discrepancies).

Counterparty A	Counterparty B	Amount payed	Amount received	Traders (alphabetical)
John	Mark	100	60	John, Mark
Mark	John	65	90	John, Mark
Philip	Isaac	110	20	Isaac, Philip
Isaac	Philip	20	100	Isaac, Philip

For each row, I would like to add the cells from the opposite side of the trade, so that the resulting table looks something like this:

Counterparty A_x	Counterparty B_x	Amount payed_x	Amount received_x	Counterparty A_y	Counterparty B_y	Amount payed_y	Amount received_y
John	Mark	100	60	Mark	John	65	90
Mark	John	65	90	John	Mark	100	60
Philip	Isaac	110	20	Isaac	Philip	20	100
Isaac	Philip	20	100	Philip	Isaac	110	20

This seems like something that should be pretty straight-forward, but I somehow can't get it to work. I have tried a whole lot of groupbys and splits and merges. For example creating one df keeping the rows with the first occurence of the shared "Traders" value and one df keeping the rows with the second occurence of the shared "Traders". And then I tried to merge them onto their "opposite" row in the full dataframe. But didn't manage to get anywhere.

Are anyone able to help?

Edit: Forgot to mention the discrepancies between the reported amounts

CodePudding user response：

Just define two new columns with the criteria you want. It looks like you want something like this

df['Counterparty A_y'] = df['Counterparty B']
df['Counterparty B_y'] = df['Counterparty A']
df['Amount payed_y'] = df['Amount received']
df['Amount recieved_y'] = df['Amount payed']

This will keep your old columns and add the new ones. You could then use df.rename to rename your original columns so they have x's, as you have in your desired result.

CodePudding user response：

There must be pairs of rows for each trade. These may well be adjacent rows; if not then there must be some column with a tradeID which can be used to sort the rows into adjacent pairs. Then the solution is simply a matter of producing a new data frame with each of the row pairs swapped and combining these into one side-by-side DateFrame. The code below shows the principle - you can modify to suit, re-arrange and re-name columns to produce your required output. I don't think there is a vectorized way to do this so looping is the only way even though it will take some time with a large data size.

df= pd.DataFrame({'a':[1, 3, 5, 7, 9, 11], 'b': [2, 4, 6, 8, 10, 12]})

num = len(df)-1

df_flip=df.copy()
df_flip.columns = ['c', 'd']

for idx in range(0, num, 2):
    df_flip.iloc[idx,:],df_flip.iloc[idx 1,:]=df.iloc[idx 1,:],df.iloc[idx,:]

combi = pd.concat([df, df_flip], axis = 1)

which produces the following with the a, b columns swapped in pairs to c, d:

    a   b   c   d
0   1   2   3   4
1   3   4   1   2
2   5   6   7   8
3   7   8   5   6
4   9  10  11  12
5  11  12   9  10