Home > other >  How to compare and set values of DF based on conditions?
How to compare and set values of DF based on conditions?

Time:01-03

I have a DF similar to the one below:

data = {
'Trade Date': ['2022-12-27', '2022-12-27', '2022-12-27', '2022-12-27', '2022-12-27', 
               '2022-12-28', '2022-12-28', '2022-12-28', '2022-12-28', '2022-12-28', 
               '2022-12-29', '2022-12-29', '2022-12-29', '2022-12-29', '2022-12-29'],
'Ticker': ['A', 'B', 'C', 'D', 'E', 
           'A', 'B', 'C', 'D', 'E',
           'A', 'B', 'C', 'D', 'E'],
'Rate': [0.01, 2.82, 1.87, 5.00, 4.92, 
         0.50, 2.72, 0.99, 8.77, 4.22,
         1.01, 3.42, 2.87, 4.50, 25.92]}

df = pd.DataFrame(data)

It contains daily data of stocks traded in Brazilian Exchange -

df

I would like to create a new column shifting 1 day, looking to the columns "Trade Date" and "Ticker" and calculate the daily variation of "Rate". But comparing values between the same ticker and between two dates (dec/29 with dec/28; dec/28 with dec/27; etc).

The output I'd like to see for the Ticker 'A' is given below, however I'd like to do that in the original DF, taking in account all the different Tickers and Trade Dates.
For example: I need to compare the values of 'A' from 2022-12-29 with the values of 'A' from 2022-12-28; The values of 'B' from 2022-12-29 with the values of 'B' from 2022-12-28.

sample = df.loc[df['Ticker'] == 'A'].reset_index(drop=True)
sample['diff'] = sample['Rate'] - sample.shift(1)['Rate']
sample

CodePudding user response:

You can create helper DataFrame with add one day to Trade Date for shift, then add helper column with shifted values:

df['Trade Date'] = pd.to_datetime(df['Trade Date'])

df1 = df.copy()
df1['Trade Date'] = df1['Trade Date'].add(pd.Timedelta('1 day'))

df1 = df.merge(df1, on=['Trade Date','Ticker'], how='left')
df1['diff'] = df1['Rate_x'].sub(df1['Rate_y'])
print (df1)
   Trade Date Ticker  Rate_x  Rate_y   diff
0  2022-12-27      A    0.01     NaN    NaN
1  2022-12-27      B    2.82     NaN    NaN
2  2022-12-27      C    1.87     NaN    NaN
3  2022-12-27      D    5.00     NaN    NaN
4  2022-12-27      E    4.92     NaN    NaN
5  2022-12-28      A    0.50    0.01   0.49
6  2022-12-28      B    2.72    2.82  -0.10
7  2022-12-28      C    0.99    1.87  -0.88
8  2022-12-28      D    8.77    5.00   3.77
9  2022-12-28      E    4.22    4.92  -0.70
10 2022-12-29      A    1.01    0.50   0.51
11 2022-12-29      B    3.42    2.72   0.70
12 2022-12-29      C    2.87    0.99   1.88
13 2022-12-29      D    4.50    8.77  -4.27
14 2022-12-29      E   25.92    4.22  21.70

If all datetimes are consecutive and no missing per groups by tickers is possible also use:

df['diff'] = df.groupby("Ticker")["Rate"].diff()
print (df)
   Trade Date Ticker   Rate   diff
0  2022-12-27      A   0.01    NaN
1  2022-12-27      B   2.82    NaN
2  2022-12-27      C   1.87    NaN
3  2022-12-27      D   5.00    NaN
4  2022-12-27      E   4.92    NaN
5  2022-12-28      A   0.50   0.49
6  2022-12-28      B   2.72  -0.10
7  2022-12-28      C   0.99  -0.88
8  2022-12-28      D   8.77   3.77
9  2022-12-28      E   4.22  -0.70
10 2022-12-29      A   1.01   0.51
11 2022-12-29      B   3.42   0.70
12 2022-12-29      C   2.87   1.88
13 2022-12-29      D   4.50  -4.27
14 2022-12-29      E  25.92  21.70
  • Related