I am trying to check three continuous values in a column and if they are all positive, then create a new column with a string value in the third row. My index is the date index.
I want a new column created in my data frame and want to check in a loop if three consecutive values in a row are positive, then return a string value of 'increasing' or if all three are negative, then return a value of 'decreasing' or if neither, then return 'none'. And this new value should be in the new column and in the row that is the last one of the three values that have been checked.
I have tried below but whatever variation I use, it is not working.
df['num_change'] = df.num.diff()
result = []
for i in range(len(df)):
if np.all(df['num_change'].values[i:i 3]) < 0:
result.loc[[i 3],'Trend'] =('decreasing')
elif np.all(df['num_change'].values[i:i 3]) > 0:
result.loc[[i 3],'Trend'] =('increasing')
else:
result.loc[[i 3],'Trend'] =('none')
df["new_col"] = result
I am unfortunately not able to insert an image here, I hope someone is patient enough to help me still.
CodePudding user response:
You could do this as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col' : [1,2,3,-4,-5,-6,7,8,9]})
start = 0
end = 3
result = [None] * 2 # because trend will start after the third value
while end <= len(df.col):
if np.all(df.col[start:end] > 0):
result.append("Increasing")
elif np.all(df.col[start:end] < 0):
result.append("Decreasing")
else:
result.append(None)
start = 1
end = 1
df["new_col"] = result
In this solution, the while-loop runs till the subset of the column in the data frame has at least 3 values, i.e. end
is less than or equals to the length of df.col
. Inside it, the first three elements of the column will be checked. If all of them are greater than 0, then the trend "increasing" will be added to the result. If not, then the trend "decreasing" will be added. Otherwise, None
is added.
The first two elements of the result are None
because there can be no comparison for the first two elements as the comparison is for the first 3 elements and so on. The start
and end
are 0 and 3 respectively, which are incremented by 1 after each iteration. The output is as shown below:
>>> df
col new_col
0 1 None
1 2 None
2 3 Increasing
3 -4 None
4 -5 None
5 -6 Decreasing
6 7 None
7 8 None
8 9 Increasing
CodePudding user response:
This can be achieved with a custom rolling
without an (explicit) loop
First we define the aggregation (it has to return a numeric value):
def trend(s):
if (s < 0).all():
return -1
if (s > 0).all():
return 1
return 0
Now apply it and map to a label
df['trend'] = (df['col'].rolling(3, min_periods = 1)
.apply(trend)
.map({1:'Increasing', -1:'Decreasing', 0:'none'})
)
output
col trend
0 1 Increasing
1 2 Increasing
2 3 Increasing
3 -4 none
4 -5 none
5 -6 Decreasing
6 7 none
7 8 none
8 9 Increasing
Note that we set min_periods
to 1 here which has the effect of filling the first two rows based on the sub-series of 1 or 2 elements. if you don't want that you can delete the min_periods bit