Home > OS >  pandas dataframe conditional population of a new column
pandas dataframe conditional population of a new column

Time:03-08

I am working on manipulation of a column(Trend) in pandas DataFrame. Below is my source DataFrame. Currently I have set it to 0.

enter image description here

The logic I want to use to populate Trend column is below

  1. if df['Close'] > df.shift(1)['Down'] then 1

  2. if df['Close'] < df.shift(1)['Up'] then -1

  3. if any one of the above condition does not meet then, df.shift(1)['Trend']. if this value is NaN then set it to 1.

Above code in plainText,

  1. if current close is greater then previous row value of Down column then 1
  2. if current close is less than previous row value of Up column then -1
  3. if any one of those conditions does not meet, then set previous row value of Trend column as long as its not NaN. if its NaN then set to 1

UPDATE

Data as text

   Close        Up      Down  Trend
   3.138       NaN       NaN      0
   3.141       NaN       NaN      0
   3.141       NaN       NaN      0
   3.130       NaN       NaN      0
   3.110       NaN       NaN      0
   3.130  3.026432  3.214568      0
   3.142  3.044721  3.214568      0
   3.140  3.047010  3.214568      0
   3.146  3.059807  3.214568      0
   3.153  3.064479  3.214568      0
   3.173  3.080040  3.214568      0
   3.145  3.080040  3.214568      0
   3.132  3.080040  3.214568      0
   3.131  3.080040  3.209850      0
   3.141  3.080040  3.209850      0
   3.098  3.080040  3.205953      0
   3.070  3.080040  3.195226      0

Expected output

enter image description here

CodePudding user response:

We could use numpy.select to select values depending on which condition is satisfied. Then pass the outcome of numpy.select to fillna to fill in missing "Trend" values with it (this is used to not lose existing "Trend" values). Then since NaN trend values must be filled with previous "Trend" value, we use ffill and fill the remaining NaN values with 1.

import numpy as np
df['Trend'] = (df['Trend'].replace(0, np.nan)
               .fillna(pd.Series(np.select([df['Close'] > df['Down'].shift(), 
                                            df['Close'] < df['Up'].shift()],
                                           [1, -1], np.nan), index=df.index))
               .ffill().fillna(1))

Output:

    Close        Up      Down  Trend
0   3.138       NaN       NaN    1.0
1   3.141       NaN       NaN    1.0
2   3.141       NaN       NaN    1.0
3   3.130       NaN       NaN    1.0
4   3.110       NaN       NaN    1.0
5   3.130  3.026432  3.214568    1.0
6   3.142  3.044721  3.214568    1.0
7   3.140  3.047010  3.214568    1.0
8   3.146  3.059807  3.214568    1.0
9   3.153  3.064479  3.214568    1.0
10  3.173  3.080040  3.214568    1.0
11  3.145  3.080040  3.214568    1.0
12  3.132  3.080040  3.214568    1.0
13  3.131  3.080040  3.209850    1.0
14  3.141  3.080040  3.209850    1.0
15  3.098  3.080040  3.205953    1.0
16  3.070  3.080040  3.195226   -1.0
  • Related