Home > OS >  calculate percent change in pandas dataframe, subject to conditions
calculate percent change in pandas dataframe, subject to conditions

Time:12-21

I have a dataframe that looks like:

df = pd.DataFrame({'name': ['Portrait of Dr. Gachet', 'Salvator Mundi','Interchange'], 'sold_price': [1000.0, 5000.0, 2500.0, 6000.0, 8000.0, 16000.0, 20000.0, 9000.0, 40000.0], 'serialized_trx': [1, 1, 1, 2, 2, 2, 3, 3, 3]}

The data shows the sold_price of each name'd artwork each time it was auctioned. The data shows, for example, that the first time 'Portrait of Dr. Gachet' was sold, it sold for 1000.0 currency. The second time the same painting sold, it sold for 6000.0 currency.

column 'serialized_trx' is a counter for each time the item is re-sold. 'Portrait of Dr. Gachet' sold for 1000.0 currency the first time it went to auction, so 'serialized_trx' = 1. It sold for 6000 currency when 'serialized_trx' = 2.

I need to append a column that shows the percent change in value for each transaction, iteratively. For example, the percent change in value at 'Portrait of Dr. Gachet' serial 2 (vs. serial 1) is 500%. The percent change in value at 'Portrait of Dr. Gachet' serial 3 (vs. serial 2) is 233.33%.

I want to repeat this for each pair condition where 'item' = X and 'serialized_trx' start = n and finish = n - 1, such that I can track the change in value of each 'item' every time it changes hands.

The dataframe should have a new column at the end, like this (pct_change where serialized_trx = 1 are 0.0, since there is no prior transaction on which to base):

df = pd.DataFrame({'name': ['Portrait of Dr. Gachet', 'Salvator Mundi','Interchange'], 'sold_price': [1000.0, 5000.0, 2500.0, 6000.0, 8000.0, 16000.0, 20000.0, 9000.0, 40000.0], 'serialized_trx': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'pct_change': [0.0, 0.0, 0.0, 500.0, 60.0, 166.67, 233.33, 12.5, 150.0]}

Can anyone help vectorize or otherwise iterate/compute this percent change where the item must equal the same item and the transaction serial number must be iterative (n, n - 1)?

CodePudding user response:

Try this:

df['pct_change'] = df.groupby('name')['sold_price'].pct_change().mul(100).fillna(0).round(2)

Output:

>>> df
                     name  sold_price  serialized_trx  pct_change
0  Portrait of Dr. Gachet      1000.0               1        0.00
1          Salvator Mundi      5000.0               1        0.00
2             Interchange      2500.0               1        0.00
3  Portrait of Dr. Gachet      6000.0               2      500.00
4          Salvator Mundi      8000.0               2       60.00
5             Interchange     16000.0               2      540.00
6  Portrait of Dr. Gachet     20000.0               3      233.33
7          Salvator Mundi      9000.0               3       12.50
8             Interchange     40000.0               3      150.00
  • Related