Home > Mobile >  Pandas DataFrame: Reorganise following .groupby()
Pandas DataFrame: Reorganise following .groupby()

Time:06-01

I am looking at NFT returns. I have a dataset containing repeated transactions for specific IDs:

df_new = RSR.reset_index(drop=True)
print(df_new.head())

Here is the output:
 
   Asset ID     Collection        Date  Transaction price (USD)
0  10302582           axie  29/01/2020                   3.1159
1  10302582           axie  29/01/2020                   2.4535
2  10406110  cryptokitties  07/01/2020                   1.4192
3  10406110  cryptokitties  22/01/2020                   0.8415
4  10424431           axie  02/01/2020                   1.5289
...

The number of transactions for a singular ID ranges between 2 and n.

I am trying to have the output showed in the picture enclosed: link to ideal output.

Basically, I have a line per completed transactions so I can compute the returns.

I am managing to reach a very similar output when the there is only two transactions by using

c = df_new["Asset ID"]
RSR_clean = df_new.set_index([c, df_new.groupby(c).cumcount()   1]).unstack().sort_index(1, 1)

The output is:

        Asset ID             Collection        Date Transaction price (USD)  \
                1                      1           1                       1    
Asset ID                                                                        
10302582  10302582                   axie  29/01/2020                  3.1159   
10406110  10406110          cryptokitties  07/01/2020                  1.4192   
10424431  10424431                   axie  02/01/2020                  1.5289   
1060112,  1060112,          cryptokitties  02/01/2020                 15.6885   
1092364,  1092364,                   axie  14/01/2020                165.9554   
...            ...                    ...         ...                     ...   
919066,   919066,           cryptokitties  10/01/2020                  1.3781   
9533256,  9533256,  cryptovoxel-wearables  21/01/2020                  0.8485   
971380,   971380,           cryptokitties  09/01/2020                 20.8469   
987084,   987084,           cryptokitties  03/01/2020                 16.1089   
992882,   992882,           cryptokitties  02/01/2020                 15.0981   

          Asset ID             Collection        Date Transaction price (USD)  \
                2                      2           2                       2    
Asset ID                                                                        
10302582  10302582                   axie  29/01/2020                  2.4535   
10406110  10406110          cryptokitties  22/01/2020                  0.8415   
10424431  10424431                   axie  14/01/2020                  3.1532   
1060112,  1060112,          cryptokitties  07/01/2020                 27.5083   
1092364,  1092364,                   axie  14/01/2020                165.9554   

N.B. in practice, this is a collection of columns for each asset ID.

Nevertheless, I cannot find a way when there are more than 2 transactions for each asset. With my current code, they are simply added as new columns.

What I am trying to achieve is to have the 3rd, 4th, 5th, and so on transactions to be a new row each time. In these new rows, the 3rd and 4th column should be the previous transaction information.

Do you know how I can achieve this layout? Thanks very much!

CodePudding user response:

If I'm understanding the problem, you have a DataFrame of transactions where by each asset (column "Asset ID") has two or more rows of data. You would like to join subsequent rows of data all associated to a single asset in chronological order to find out the returns for transaction.

What I believe you need to create is a "Transaction ID" column which goes from 1 to n-1 which you will then self join to. Taking your code, something like this should work

df = RSR.reset_index(drop=True)

# Sort the data read for processing
df.sort_values(["Asset ID", "Date"], inplace=True)

# Add a Transaction ID column for the purchase
df["Transaction ID"] = df.groupby(["Asset ID"])["Transaction ID"].cumcount()

# Create a copy of the DataFrame for joining
df_sell = df.copy()

# Bump the Transaction ID for the sale
df_sell["Transaction ID"] = df_sell["Transaction ID"]   1

# Join the Two DataFrames
df = df.merge(df_sell, on=["Asset ID", "Transaction ID"], suffixes=(" Buy", " Sell"))

This should output a similar table as you expected, but it would have 1 and 2 but all the columns will have a "Buy" or "Sell" suffix.

  • Related