Home > Software engineering >  Python Panda: Modify Dataframe with mask and create new Dataframe
Python Panda: Modify Dataframe with mask and create new Dataframe

Time:09-28

I have the following df:

date   price   MA1  MA2 MA3
date0  price0  12   10  8
date1  price1  11   11 11 
date2  price2  12   21 14
date3  price3  13   12 15
date4  price4  14   14 14
date5  price5  15   17 14
date6  price6  19   16 15
date7  price7  15   12 13
date8  price8  11   10 13
date9  price9  21   12 13
date10 price10 13   11 14
date11 price11 14   14 14
date12 price12 16   16 16
date13 price13 34   32 23
date14 price14 12   12 12

I filter my df using the following mask:

df =(data
    .assign(same=lambda x: (x['MA1'] == x['MA2']) & (x['MA1'] == x['MA3']))
    .loc[lambda x: x.same == True]
    )

I get:

date   price   MA1  MA2 MA3
date1  price1  11   11 11 
date4  price4  14   14 14
date11 price11 14   14 14
date12 price12 16   16 16
date14 price14 12   12 12

So the dates where MA1, MA2 and M3 are matching are date1, date4, date11, date 12, date14.

I would like to create a df which matches this format

date   price  price_past price_fut      return_past                return_future            
date1  price1  price0    price2      (price1-price0)/price0     (price2-price1)/price1     
date4  price4  price3    price5      (price4-price3)/price3     (price5-price4)/price4      
date11 price11 price10   price12     (price11-price10)/price10  (price12-price11)/price11  
date12 price12 price11   price13     (price12-price11)/price11  (price13 -price12)/price12                
date14 price14 price13   price15     (price14-price13)/price13  (price15-price14)/price14   

I appreciate the help

CodePudding user response:

What if we sorted by date and defined an id column ?

import pandas as pd
df=pd.DataFrame(data={'date':['2022-01-26 13:48:19','2015-01-27 13:48:19','2022-01-28 13:48:19','2018-01-29 13:48:19','2017-01-30 13:48:19'],
                      'price':[100,200,300,400,500],
                      'MA1':[12,11,12,13,13],
                      'MA2':[10,11,21,12,14],
                      'MA3':[8,11,14,15,14]})
df.sort_values(by='date', inplace=True) #sort by date
df['id']=df.index     #create an id column.

then apply your filter:

df1 =(df
    .assign(same=lambda x: (x['MA1'] == x['MA2']) & (x['MA1'] == x['MA3']))
    .loc[lambda x: x.same == True]
    )

define previous and next dates' ids.

df1['price_past_id']=df['id'] - 1
df1['price_fut_id']=df['id']   1

finally, join price_past_id and price_fut_id with the id we defined at the beginning :

#step1: join on past_id
final=df1.merge(df[['id','price']],how='left',left_on='price_past_id',right_on='id')
final=final.drop(['id_x','price_past_id','id_y'],axis=1) #drop unnecessary columns

#step2: join on fut_id
final=final.merge(df[['id','price']],how='left',left_on='price_fut_id',right_on='id')
final=final.drop(['price_fut_id','id'],axis=1)

final=final.rename(columns={'price_x':'price','price_y':'price_past','price':'price_fut'})

#formuls
final['return_past'] = (final['price'] - final['price_past']) / final['price_past']
final['return_future'] = (final['price_fut'] - final['price']) / final['price']

CodePudding user response:

A possible solution (I thank @AdrienPacifico for having suggested using pandas.query, which is very simplifying):

# I made column price equal to column MA1 
# to have numbers to play with
df['price'] = df['MA1']

df['price_past'] = df['price'].shift()
df['price_fut'] = df['price'].shift(-1)
df['return_past'] = (df['price'] - df['price_past']) / df['price_past']
df['return_future'] = (df['price_fut'] - df['price']) / df['price']

df.query("MA1==MA2==MA3").drop(['MA1', 'MA2', 'MA3'], axis=1)

Output:

      date  price  price_past  price_fut  return_past  return_future
1    date1     11        12.0       12.0    -0.083333       0.090909
4    date4     14        13.0       15.0     0.076923       0.071429
11  date11     14        13.0       16.0     0.076923       0.142857
12  date12     16        14.0       34.0     0.142857       1.125000
14  date14     12        34.0        NaN    -0.647059            NaN
  • Related