I am looking at NFT returns. I have a dataset containing repeated transactions for specific IDs:
df_new = RSR.reset_index(drop=True)
print(df_new.head())
Here is the output:
Asset ID Collection Date Transaction price (USD)
0 10302582 axie 29/01/2020 3.1159
1 10302582 axie 29/01/2020 2.4535
2 10406110 cryptokitties 07/01/2020 1.4192
3 10406110 cryptokitties 22/01/2020 0.8415
4 10424431 axie 02/01/2020 1.5289
...
The number of transactions for a singular ID ranges between 2 and n.
I am trying to have the output showed in the picture enclosed: link to ideal output.
Basically, I have a line per completed transactions so I can compute the returns.
I am managing to reach a very similar output when the there is only two transactions by using
c = df_new["Asset ID"]
RSR_clean = df_new.set_index([c, df_new.groupby(c).cumcount() 1]).unstack().sort_index(1, 1)
The output is:
Asset ID Collection Date Transaction price (USD) \
1 1 1 1
Asset ID
10302582 10302582 axie 29/01/2020 3.1159
10406110 10406110 cryptokitties 07/01/2020 1.4192
10424431 10424431 axie 02/01/2020 1.5289
1060112, 1060112, cryptokitties 02/01/2020 15.6885
1092364, 1092364, axie 14/01/2020 165.9554
... ... ... ... ...
919066, 919066, cryptokitties 10/01/2020 1.3781
9533256, 9533256, cryptovoxel-wearables 21/01/2020 0.8485
971380, 971380, cryptokitties 09/01/2020 20.8469
987084, 987084, cryptokitties 03/01/2020 16.1089
992882, 992882, cryptokitties 02/01/2020 15.0981
Asset ID Collection Date Transaction price (USD) \
2 2 2 2
Asset ID
10302582 10302582 axie 29/01/2020 2.4535
10406110 10406110 cryptokitties 22/01/2020 0.8415
10424431 10424431 axie 14/01/2020 3.1532
1060112, 1060112, cryptokitties 07/01/2020 27.5083
1092364, 1092364, axie 14/01/2020 165.9554
N.B. in practice, this is a collection of columns for each asset ID.
Nevertheless, I cannot find a way when there are more than 2 transactions for each asset. With my current code, they are simply added as new columns.
What I am trying to achieve is to have the 3rd, 4th, 5th, and so on transactions to be a new row each time. In these new rows, the 3rd and 4th column should be the previous transaction information.
Do you know how I can achieve this layout? Thanks very much!
CodePudding user response:
If I'm understanding the problem, you have a DataFrame of transactions where by each asset (column "Asset ID") has two or more rows of data. You would like to join subsequent rows of data all associated to a single asset in chronological order to find out the returns for transaction.
What I believe you need to create is a "Transaction ID" column which goes from 1 to n-1 which you will then self join to. Taking your code, something like this should work
df = RSR.reset_index(drop=True)
# Sort the data read for processing
df.sort_values(["Asset ID", "Date"], inplace=True)
# Add a Transaction ID column for the purchase
df["Transaction ID"] = df.groupby(["Asset ID"])["Transaction ID"].cumcount()
# Create a copy of the DataFrame for joining
df_sell = df.copy()
# Bump the Transaction ID for the sale
df_sell["Transaction ID"] = df_sell["Transaction ID"] 1
# Join the Two DataFrames
df = df.merge(df_sell, on=["Asset ID", "Transaction ID"], suffixes=(" Buy", " Sell"))
This should output a similar table as you expected, but it would have 1 and 2 but all the columns will have a "Buy" or "Sell" suffix.