I'm trying to apply a function to a column in a dataframe using one input variable, but I need it to have two output variables. eg:
def func(var1):
if var1<5:
return A=3, B=5
elif var1<10:
return A=3, B=10
else:
return A=7, B=10
is there a way to do this without defining two functions for A & B separately?
Thanks
CodePudding user response:
Use numpy.select
with broadcasting masks:
df = pd.DataFrame({'var1':range(3, 15)})
df[['A', 'B']] = np.select([df['var1'].lt(5).to_numpy()[:, None],
df['var1'].lt(10).to_numpy()[:, None]],
[[3,5], [3,10]],
default=[7,10])
print (df)
var1 A B
0 3 3 5
1 4 3 5
2 5 3 10
3 6 3 10
4 7 3 10
5 8 3 10
6 9 3 10
7 10 7 10
8 11 7 10
9 12 7 10
10 13 7 10
11 14 7 10
Your solution is possible change:
def func(var1):
if var1<5:
return (3, 5)
elif var1<10:
return (3, 10)
else:
return (7, 10)
df[['A','B']] = df['var1'].apply(func).tolist()
print (df)
var1 A B
0 3 3 5
1 4 3 5
2 5 3 10
3 6 3 10
4 7 3 10
5 8 3 10
6 9 3 10
7 10 7 10
8 11 7 10
9 12 7 10
10 13 7 10
11 14 7 10
CodePudding user response:
Here is a way using a DataFrame to define the columns to add:
choices = pd.DataFrame([[3,5], [3,10], [7,10]], columns=['A', 'B'])
# A B
# 0 3 5
# 1 3 10
# 2 7 10
a = np.select([df['var1'].lt(5), df['var1'].lt(10)], [0, 1], 2)
# array([0, 0, 1, 1, 1, 2])
df.join(choices.iloc[a].set_axis(df.index))
output:
var1 A B
0 0 3 5
1 3 3 5
2 5 3 10
3 7 3 10
4 9 3 10
5 11 7 10
used input: df = pd.DataFrame({'var1': [0, 3, 5, 7, 9, 11]})