I have a df like the below:
df:
Variable | Cutoff
0 abs | (-88.01,2.0]
1. abs | (2.0,6.0]
2. abs | (6.0, 18.0]
and so on..
I have another dataframe as X
X:
Pan_no | abs
XXX | 1.0
YYY | 5.0
ZZZ | 17
FFF | -88.01
I am trying to use a for loop in the below code:
Original code: (this code works, only the for loop code doesn't work)
Conditions =
(X['abs'] >= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.left)[0]) &
(X['abs'] <= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.right)[0]),
(X['abs'] >= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.left)[1]) &
(X['abs'] <= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.right)[1]),
(X['abs'] >= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.left)[2]) &
(X['abs'] <= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.right)[2])]
values = df[df['Variable'] == 'abs'].index
X['new'] = np.select(conditions, values]
For loop code: (using for loop in the original code)
for i in df[df['Variable'] == 'abs].index:
if (X['abs'] >= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.left)[i]) &
(X['abs'] <= df[df['Variable'] == 'abs']['Cutoff].apply(lambda x: x.right)[i]):
values = i
X['new'] = np.select(conditions,values)
It throws an error as the true value of a series is ambiguous. Use a empty,a bool(),a item(),a.any() or a.all()
df.to_dict() #example of how the data structure looks like
{'Variable': {0: 'ABS_RETURNS', 1: 'ABS_RETURNS', 2: 'ABS_RETURNS', 3: 'ABS_RETURNS', 4: 'ABS_RETURNS', 5: 'ABS_RETURNS', 6: 'ABS_RETURNS', 7: 'ABS_RETURNS', 8: 'ABS_RETURNS'}, 'Cutoff': {0: Interval(-88.001, 2.0, closed='right'), 1: Interval(2.0, 6.0, closed='right'), 2: Interval(6.0, 18.0, closed='right'), 3: Interval(18.0, 42.0, closed='right'), 4: Interval(42.0, 73.0, closed='right'), 5: Interval(73.0, 110.0, closed='right'), 6: Interval(110.0, 158.0, closed='right'), 7: Interval(158.0, 240.0, closed='right'), 8: Interval(240.0, 458.5, closed='right')}, 'N': {0: 57314, 1: 44048, 2: 48797, 3: 51138, 4: 48655, 5: 50148, 6: 49452, 7: 49583, 8: 99709}, 'Events': {0: 13130, 1: 11774, 2: 13360, 3: 13650, 4: 10365, 5: 7521, 6: 5382, 7: 4402, 8: 6271}, '% of Events': {0: 0.15293226952419778, 1: 0.1371381981247452, 2: 0.1556112049385592, 3: 0.15898899306971057, 4: 0.12072680682546154, 5: 0.08760118804961854, 6: 0.06268708869605731, 7: 0.051272494321821675, 8: 0.0730417564498282}, 'Non-Events': {0: 44184, 1: 32274, 2: 35437, 3: 37488, 4: 38290, 5: 42627, 6: 44070, 7: 45181, 8: 93438}, '% of Non-Events': {0: 0.1069859003508568, 1: 0.0781473598570422, 2: 0.08580615948608801, 3: 0.09077239345357867, 4: 0.09271433379581538, 5: 0.10321582415028004, 6: 0.10670986394310744, 7: 0.1094000082326648, 8: 0.22624815673056667}, 'WoE': {0: 0.3572980871738173, 1: 0.5623928895449077, 2: 0.595269827525508, 3: 0.5604797713235277, 4: 0.26400711185653863, 5: -0.16402761602178317, 6: -0.5319580950911826, 7: -0.7578565299201145, 8: -1.1306011517223742}, 'IV': {0: 0.016416549818216778, 1: 0.03317602799004981, 2: 0.04155283736690278, 3: 0.03823402415331825, 4: 0.007395492100516064, 5: 0.0025612315346391773, 6: 0.023418271661048054, 7: 0.04405231598535476, 8: 0.17321533260864197}}
CodePudding user response:
The problem is that your if
condition below in for
loop is a pandas Series. if
doesn't handle pandas Series, so it gives you the error. You should use Series.any()
or Series.all()
to get a boolean True
or False
value to feed into if
.
(X['abs'] >= df[df['Variable'] == 'abs']['Cutoff'].apply(lambda x: x.left)[i]) &
(X['abs'] <= df[df['Variable'] == 'abs']['Cutoff'].apply(lambda x: x.right)[i])
Besides, based on your non-loop version, I think you want to write:
for i in range(3):
condition = (X['abs'] >= df[df['Variable'] == 'abs']['Cutoff'].apply(lambda x: x.left)[i]) &
(X['abs'] <= df[df['Variable'] == 'abs']['Cutoff'].apply(lambda x: x.right)[i])
values = df[df['Variable'] == 'abs'].index
X['new'] = np.select(condition, values]