Home > Software design >  How to check if data is going up or down and add column - Python
How to check if data is going up or down and add column - Python

Time:11-11

I have this sample data


Date = [np.datetime64('2021-07-21'), np.datetime64('2021-07-21'), np.datetime64('2021-07-21'), np.datetime64('2021-07-21'),
       np.datetime64('2021-07-21'), np.datetime64('2021-07-21'), np.datetime64('2021-07-22'), np.datetime64('2021-07-22'),
       np.datetime64('2021-07-22'),np.datetime64('2021-07-22')]
val = [81.322, 81.322, 81.452, 81.552, 81.44, 81.44, 81.42, 81.43, 81.45, 81.54 ]
data_1 = pd.DataFrame({'Date': Date, 'Value': val})
data_1

    Date    Value
0   2021-07-21  81.322
1   2021-07-21  81.322
2   2021-07-21  81.452
3   2021-07-21  81.552
4   2021-07-21  81.440
5   2021-07-21  81.440
6   2021-07-22  81.420
7   2021-07-22  81.420
8   2021-07-22  81.450
9   2021-07-22  81.540

I want to see add a column that would tell me if the value is going up or down on that specific day. If the next value is the same I would look at the value after that to determine it. using .diff() < 0 does not work in this scenario for some reason. So the output wanted would be:

    Date        Value   Change           
0   2021-07-21  81.322  Up
1   2021-07-21  81.322  Up
2   2021-07-21  81.452  Up
3   2021-07-21  81.552  Up
4   2021-07-21  81.440  Down
5   2021-07-21  81.440  Down
6   2021-07-22  81.420  Up
7   2021-07-22  81.420  Up
8   2021-07-23  81.450  Up
9   2021-07-23  81.540  Up

Please let me know how I would do this.

CodePudding user response:

here's an alternative approach that considers "no change":

# set UP or DOWN, diff greater zero means UP
data_1['Change'] = np.where(data_1['Value'].diff()>0, "UP", "DOWN")

# set NO_CHANGE where diff is close to zero
m_no_cng = np.isclose(data_1['Value'].diff().fillna(0), 0)
data_1.loc[m_no_cng, 'Change'] = 'NO_CHANGE'

data_1
        Date   Value     Change
0 2021-07-21  81.322  NO_CHANGE
1 2021-07-21  81.322  NO_CHANGE
2 2021-07-21  81.452         UP
3 2021-07-21  81.552         UP
4 2021-07-21  81.440       DOWN
5 2021-07-21  81.440  NO_CHANGE
6 2021-07-22  81.420       DOWN
7 2021-07-22  81.430         UP
8 2021-07-22  81.450         UP
9 2021-07-22  81.540         UP

CodePudding user response:

Use DataFrame.drop_duplicates for remove duplicates, test difference per groups by DataFrameGroupBy.diff with Series.lt for less like 0 and last add next duplicated boolean by Series.reindex and pass to numpy.where:

m = (data_1.drop_duplicates(['Date','Value'])
           .groupby('Date')['Value']
           .diff()
           .lt(0)
           .reindex(data_1.index, method='ffill'))
data_1['Change'] = np.where(m, 'Down', 'Up')
print (data_1)
        Date   Value Change
0 2021-07-21  81.322     Up
1 2021-07-21  81.322     Up
2 2021-07-21  81.452     Up
3 2021-07-21  81.552     Up
4 2021-07-21  81.440   Down
5 2021-07-21  81.440   Down
6 2021-07-22  81.420     Up
7 2021-07-22  81.430     Up
8 2021-07-22  81.450     Up
9 2021-07-22  81.540     Up

Some alternatives:

m = (data_1.drop_duplicates(['Date','Value'])
           .groupby('Date')['Value']
           .diff()
           .lt(0)
           .reindex(data_1.index, method='ffill'))

m1 = data_1.duplicated(['Date','Value']) | ~data_1['Date'].duplicated()
m2 = data_1.duplicated(['Date','Value'])

data_1['Change1'] = np.where(m, 'Down', 'Up')

data_1['Change2'] = np.select([m1, m], ['No change','Down'], 'Up')
data_1['Change3'] = np.select([m2, m], ['No change','Down'], 'Up')

data_1['Change4'] = np.select([m2, m1, m], ['No change','Not defined','Down'], 'Up')
print (data_1)
        Date   Value Change1    Change2    Change3      Change4
0 2021-07-21  81.322      Up  No change         Up  Not defined
1 2021-07-21  81.322      Up  No change  No change    No change
2 2021-07-21  81.452      Up         Up         Up           Up
3 2021-07-21  81.552      Up         Up         Up           Up
4 2021-07-21  81.440    Down       Down       Down         Down
5 2021-07-21  81.440    Down  No change  No change    No change
6 2021-07-22  81.420      Up  No change         Up  Not defined
7 2021-07-22  81.430      Up         Up         Up           Up
8 2021-07-22  81.450      Up         Up         Up           Up
9 2021-07-22  81.540      Up         Up         Up           Up
  • Related