I have a pandas Dataframe as follows
data = {
'ID' : [0,0,0,0,0,1],
'DAYS': [293,1111,3020,390,210,10],
}
df = pd.DataFrame(data, columns = ['ID','DAYS'])
ID DAYS
0 0 293
1 0 1111
2 0 3020
3 0 390
4 0 210
5 1 10
What I am trying to do is the simple apply function with the following condition and outputs column as boolean :
df['bool'] = df.apply(lambda x:( x['DAYS'] < 365),axis =1 )
and i would like to optimize this apply-lambda part.. I managed to do in numpy array
df['bool_numpy'] = np.where(df['DAYS'] <365 ,True ,False)
But I am struggling applying same thing for np.vectorize method.
def copy_filter(df):
if df['DAYS'] <365:
return True
else:
return False
a= np.vectorize(copy_filter, otypes = [bool])
df['bool_vectorize'] = a(df['DAYS'])
but gave me an error. Any help would be appreciated. and also, any other optimization technique on this problem would be great as well!
CodePudding user response:
You don't need apply
nor vectorize
for this:
df['bool'] = df['DAYS'] < 365
output:
ID DAYS bool
0 0 293 True
1 0 1111 False
2 0 3020 False
3 0 390 False
4 0 210 True
5 1 10 True
CodePudding user response:
Change your function to
def copy_filter(x):
if x <365:
return True
else:
return False
a= np.vectorize(copy_filter, otypes = [bool])
df['bool_vectorize'] = a(df['DAYS'])
df
ID DAYS bool_vectorize
0 0 293 True
1 0 1111 False
2 0 3020 False
3 0 390 False
4 0 210 True
5 1 10 True