I have the following df:
date price MA1 MA2 MA3
date0 price0 12 10 8
date1 price1 11 11 11
date2 price2 12 21 14
date3 price3 13 12 15
date4 price4 14 14 14
date5 price5 15 17 14
date6 price6 19 16 15
date7 price7 15 12 13
date8 price8 11 10 13
date9 price9 21 12 13
date10 price10 13 11 14
date11 price11 14 14 14
date12 price12 16 16 16
date13 price13 34 32 23
date14 price14 12 12 12
I filter my df using the following mask:
df =(data
.assign(same=lambda x: (x['MA1'] == x['MA2']) & (x['MA1'] == x['MA3']))
.loc[lambda x: x.same == True]
)
I get:
date price MA1 MA2 MA3
date1 price1 11 11 11
date4 price4 14 14 14
date11 price11 14 14 14
date12 price12 16 16 16
date14 price14 12 12 12
So the dates where MA1, MA2 and M3 are matching are date1, date4, date11, date 12, date14.
I would like to create a df which matches this format
date price price_past price_fut return_past return_future
date1 price1 price0 price2 (price1-price0)/price0 (price2-price1)/price1
date4 price4 price3 price5 (price4-price3)/price3 (price5-price4)/price4
date11 price11 price10 price12 (price11-price10)/price10 (price12-price11)/price11
date12 price12 price11 price13 (price12-price11)/price11 (price13 -price12)/price12
date14 price14 price13 price15 (price14-price13)/price13 (price15-price14)/price14
I appreciate the help
CodePudding user response:
What if we sorted by date and defined an id column ?
import pandas as pd
df=pd.DataFrame(data={'date':['2022-01-26 13:48:19','2015-01-27 13:48:19','2022-01-28 13:48:19','2018-01-29 13:48:19','2017-01-30 13:48:19'],
'price':[100,200,300,400,500],
'MA1':[12,11,12,13,13],
'MA2':[10,11,21,12,14],
'MA3':[8,11,14,15,14]})
df.sort_values(by='date', inplace=True) #sort by date
df['id']=df.index #create an id column.
then apply your filter:
df1 =(df
.assign(same=lambda x: (x['MA1'] == x['MA2']) & (x['MA1'] == x['MA3']))
.loc[lambda x: x.same == True]
)
define previous and next dates' ids.
df1['price_past_id']=df['id'] - 1
df1['price_fut_id']=df['id'] 1
finally, join price_past_id and price_fut_id with the id we defined at the beginning :
#step1: join on past_id
final=df1.merge(df[['id','price']],how='left',left_on='price_past_id',right_on='id')
final=final.drop(['id_x','price_past_id','id_y'],axis=1) #drop unnecessary columns
#step2: join on fut_id
final=final.merge(df[['id','price']],how='left',left_on='price_fut_id',right_on='id')
final=final.drop(['price_fut_id','id'],axis=1)
final=final.rename(columns={'price_x':'price','price_y':'price_past','price':'price_fut'})
#formuls
final['return_past'] = (final['price'] - final['price_past']) / final['price_past']
final['return_future'] = (final['price_fut'] - final['price']) / final['price']
CodePudding user response:
A possible solution (I thank @AdrienPacifico for having suggested using pandas.query
, which is very simplifying):
# I made column price equal to column MA1
# to have numbers to play with
df['price'] = df['MA1']
df['price_past'] = df['price'].shift()
df['price_fut'] = df['price'].shift(-1)
df['return_past'] = (df['price'] - df['price_past']) / df['price_past']
df['return_future'] = (df['price_fut'] - df['price']) / df['price']
df.query("MA1==MA2==MA3").drop(['MA1', 'MA2', 'MA3'], axis=1)
Output:
date price price_past price_fut return_past return_future
1 date1 11 12.0 12.0 -0.083333 0.090909
4 date4 14 13.0 15.0 0.076923 0.071429
11 date11 14 13.0 16.0 0.076923 0.142857
12 date12 16 14.0 34.0 0.142857 1.125000
14 date14 12 34.0 NaN -0.647059 NaN