I have this sample data
Date = [np.datetime64('2021-07-21'), np.datetime64('2021-07-21'), np.datetime64('2021-07-21'), np.datetime64('2021-07-21'),
np.datetime64('2021-07-21'), np.datetime64('2021-07-21'), np.datetime64('2021-07-22'), np.datetime64('2021-07-22'),
np.datetime64('2021-07-22'),np.datetime64('2021-07-22')]
val = [81.322, 81.322, 81.452, 81.552, 81.44, 81.44, 81.42, 81.43, 81.45, 81.54 ]
data_1 = pd.DataFrame({'Date': Date, 'Value': val})
data_1
Date Value
0 2021-07-21 81.322
1 2021-07-21 81.322
2 2021-07-21 81.452
3 2021-07-21 81.552
4 2021-07-21 81.440
5 2021-07-21 81.440
6 2021-07-22 81.420
7 2021-07-22 81.420
8 2021-07-22 81.450
9 2021-07-22 81.540
I want to see add a column that would tell me if the value is going up or down on that specific day. If the next value is the same I would look at the value after that to determine it. using .diff() < 0 does not work in this scenario for some reason. So the output wanted would be:
Date Value Change
0 2021-07-21 81.322 Up
1 2021-07-21 81.322 Up
2 2021-07-21 81.452 Up
3 2021-07-21 81.552 Up
4 2021-07-21 81.440 Down
5 2021-07-21 81.440 Down
6 2021-07-22 81.420 Up
7 2021-07-22 81.420 Up
8 2021-07-23 81.450 Up
9 2021-07-23 81.540 Up
Please let me know how I would do this.
CodePudding user response:
here's an alternative approach that considers "no change":
# set UP or DOWN, diff greater zero means UP
data_1['Change'] = np.where(data_1['Value'].diff()>0, "UP", "DOWN")
# set NO_CHANGE where diff is close to zero
m_no_cng = np.isclose(data_1['Value'].diff().fillna(0), 0)
data_1.loc[m_no_cng, 'Change'] = 'NO_CHANGE'
data_1
Date Value Change
0 2021-07-21 81.322 NO_CHANGE
1 2021-07-21 81.322 NO_CHANGE
2 2021-07-21 81.452 UP
3 2021-07-21 81.552 UP
4 2021-07-21 81.440 DOWN
5 2021-07-21 81.440 NO_CHANGE
6 2021-07-22 81.420 DOWN
7 2021-07-22 81.430 UP
8 2021-07-22 81.450 UP
9 2021-07-22 81.540 UP
CodePudding user response:
Use DataFrame.drop_duplicates
for remove duplicates, test difference per groups by DataFrameGroupBy.diff
with Series.lt
for less like 0
and last add next duplicated boolean by Series.reindex
and pass to numpy.where
:
m = (data_1.drop_duplicates(['Date','Value'])
.groupby('Date')['Value']
.diff()
.lt(0)
.reindex(data_1.index, method='ffill'))
data_1['Change'] = np.where(m, 'Down', 'Up')
print (data_1)
Date Value Change
0 2021-07-21 81.322 Up
1 2021-07-21 81.322 Up
2 2021-07-21 81.452 Up
3 2021-07-21 81.552 Up
4 2021-07-21 81.440 Down
5 2021-07-21 81.440 Down
6 2021-07-22 81.420 Up
7 2021-07-22 81.430 Up
8 2021-07-22 81.450 Up
9 2021-07-22 81.540 Up
Some alternatives:
m = (data_1.drop_duplicates(['Date','Value'])
.groupby('Date')['Value']
.diff()
.lt(0)
.reindex(data_1.index, method='ffill'))
m1 = data_1.duplicated(['Date','Value']) | ~data_1['Date'].duplicated()
m2 = data_1.duplicated(['Date','Value'])
data_1['Change1'] = np.where(m, 'Down', 'Up')
data_1['Change2'] = np.select([m1, m], ['No change','Down'], 'Up')
data_1['Change3'] = np.select([m2, m], ['No change','Down'], 'Up')
data_1['Change4'] = np.select([m2, m1, m], ['No change','Not defined','Down'], 'Up')
print (data_1)
Date Value Change1 Change2 Change3 Change4
0 2021-07-21 81.322 Up No change Up Not defined
1 2021-07-21 81.322 Up No change No change No change
2 2021-07-21 81.452 Up Up Up Up
3 2021-07-21 81.552 Up Up Up Up
4 2021-07-21 81.440 Down Down Down Down
5 2021-07-21 81.440 Down No change No change No change
6 2021-07-22 81.420 Up No change Up Not defined
7 2021-07-22 81.430 Up Up Up Up
8 2021-07-22 81.450 Up Up Up Up
9 2021-07-22 81.540 Up Up Up Up