I have been experiencing several problems when trying to create a Categories column in my dataframe based on if-statements linked to one of the other columns.
My dataframe (called 'kio') looks as follows when calling the .tail() function:
High Low Open Close Volume Adj Close Daily % Change
Date
2021-12-07 48326.0 46566.0 47000.0 48245.0 300090.0 48245.0 6.168302
2021-12-08 49662.0 46662.0 49870.0 46799.0 304657.0 46799.0 -2.997202
2021-12-09 47395.0 45045.0 47001.0 45332.0 267214.0 45332.0 -3.134682
2021-12-10 46899.0 43850.0 46899.0 43997.0 221268.0 43997.0 -2.944940
2021-12-13 46285.0 44150.0 46600.0 44150.0 208391.0 44150.0 0.347751
The final column ('Daily % Change') I added myself using the following code:
for n in kio['Close']:
kio['Daily % Change'] = kio['Close'].pct_change() * 100
Not sure if the format or dtype of this column is not causing an issue?
What I am trying to do is have every element in the 'Daily % Change' column be categorised as a number between 1 and 6 (absolute values). I have attempted this through thee below code:
def mvt_cat(row = kio):
if row[(row['Daily % Change'] > 0) & (row['Daily % Change'] < 1)]:
val = 1
elif row[(row['Daily % Change'] < 0) & (row['Daily % Change'] > -1)]:
val = -1
elif row[(row['Daily % Change'] >= 1) & (row['Daily % Change'] < 2)]:
val = 2
elif row[(row['Daily % Change'] <= -1) & (row['Daily % Change'] > 2)]:
val = -2
elif row[(row['Daily % Change'] >= 2) & (row['Daily % Change'] < 3)]:
val = 3
elif row[(row['Daily % Change'] <= -2) & (row['Daily % Change'] > -3)]:
val = -3
elif row[(row['Daily % Change'] >= 3) & (row['Daily % Change'] < 4)]:
val = 4
elif row[(row['Daily % Change'] <= -3) & (row['Daily % Change'] > 4)]:
val = -4
elif row[(row['Daily % Change'] >= 4) & (row['Daily % Change'] < 5)]:
val = 5
elif row[(row['Daily % Change'] <= -4) & (row['Daily % Change'] > -5)]:
val = -5
elif row[(row['Daily % Change'] >= 5)]:
val = 6
else:
val = -6
return val
kio['Move Category'] = kio.apply(mvt_cat, axis=1)
I unfortunately get various errors - the mains ones are these:
KeyError: False ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have tried the possible solutions found on links such as Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() and KeyError: False in pandas dataframe
Any assistance will be greatly appreciated!
CodePudding user response:
Try this:
import numpy as np
ceil = np.ceil(df['Daily % Change'].abs())
kio['Move Category'] = kio['Daily % Change'].ge(0).mul(2).sub(1) \
.mul(ceil).clip(-6, 6).astype(int)
Output:
>>> kio[['Daily % Change', 'Move Category']]
Daily % Change Move Category
0 6.168302 6
1 -2.997202 -3
2 -3.134682 -4
3 -2.944940 -3
4 0.347751 1
Details:
>>> ceil
0 7.0 # 6.168302
1 3.0 # -2.997202
2 4.0 # -3.134682
3 3.0 # -2.944940
4 1.0 # 0.347751
Name: Daily % Change, dtype: float64
>>> kio['Daily % Change'].ge(0).mul(2).sub(1)
0 1
1 -1
2 -1
3 -1
4 1
Name: Daily % Change, dtype: int64
CodePudding user response:
Use pd.cut
foo = pd.Series([ 6.168302, -2.997202, -3.134682, -2.944940, 0.347751])
pd.cut(abs(foo), range(0,7))
Result:
0 NaN
1 (2.0, 3.0]
2 (3.0, 4.0]
3 (2.0, 3.0]
4 (0.0, 1.0]
dtype: category
Categories (6, interval[int64, right]): [(0, 1] < (1, 2] < (2, 3] < (3, 4] < (4, 5] < (5, 6]]
CodePudding user response:
I would recommend using pandas.cut
for binning rather than what you are trying to do:
groups = [n for n in range(-6,8,1)]
df['Move Category'] = pd.cut(df['Daily % Change'], groups)
print(df)
Date High Low ... Adj Close Daily % Change Move Category
0 07/12/2021 48326 46566 ... 48245 6.168302 (6, 7]
1 08/12/2021 49662 46662 ... 46799 -2.997202 (-3, -2]
2 09/12/2021 47395 45045 ... 45332 -3.134682 (-4, -3]
3 10/12/2021 46899 43850 ... 43997 -2.944940 (-3, -2]
4 13/12/2021 46285 44150 ... 44150 0.347751 (0, 1]
[5 rows x 9 columns]
CodePudding user response:
You're getting that error because for example, row[(row['Daily % Change'] >= 4) & (row['Daily % Change'] < 5)]
is a Series, it doesn't make sense to ask if it is true or not.
You can apply mvt_cat
to a single row as:
def mvt_cat(row):
if (row > 0) & (row < 1):
val = 1
elif (row < 0) & (row > -1):
val = -1
elif (row >= 1) & (row < 2):
val = 2
elif (row <= -1) & (row > 2):
val = -2
elif (row >= 2) & (row < 3):
val = 3
elif (row <= -2) & (row > -3):
val = -3
elif (row >= 3) & (row < 4):
val = 4
elif (row <= -3) & (row > 4):
val = -4
elif (row >= 4) & (row < 5):
val = 5
elif (row <= -4) & (row > -5):
val = -5
elif (row >= 5):
val = 6
else:
val = -6
return val
kio['Move Category'] = kio['Daily % Change'].apply(mvt_cat)
Note that some of the categories don't make sense. For example, in your evaluation,
elif row[(row['Daily % Change'] <= -1) & (row['Daily % Change'] > 2)]:
will always be empty, I assume it should be:
elif (row <= -1) & (row > -2):
instead.