I have a dataframe df
and I want to apply a function to a column of the dataframe (c3
) such that it returns values for two other columns (c1
and c2
), and this should be done only on a portion of the df rows.
I would have expected this to work:
df.loc[0:20, ['c1','c2']] = df.loc[0:20, 'c3'].progress_apply(my_f)
where my_f
is the function to be applied, returning a Series with the two values for c1
and c2
,
but it doesn't, the values for c1
and c2
remains NaN
in df
after executing this, despite no error or warning is raised.
What is the correct way to do this? It should be trivial, but I'm struggling to find it
For instance with the following code:
df_test = pd.DataFrame([{'C3':3}, {'C3': 5}, {'C3': 8}])
def my_f(s):
return pd.Series(['V1', 'V2'])
df_test.loc[0:1, ['C1', 'C2']] = df_test.loc[0:1, 'C3'].progress_apply(my_f)
the result is the following:
C3 C1 C2
0 3 NaN NaN
1 5 NaN NaN
2 8 NaN NaN
while I expect the following:
C3 C1 C2
0 3 'V1' 'V2'
1 5 'V1' 'V2'
2 8 NaN NaN
CodePudding user response:
Try without .loc
like below:
df_test = pd.DataFrame([{'C3':3}, {'C3': 5}, {'C3': 8}])
def my_f(s):
return pd.Series(['V1', 'V2'])
df_test[['C1', 'C2']] = df_test[0:2]['C3'].apply(my_f)
print(df_test)
C3 C1 C2
0 3 V1 V2
1 5 V1 V2
2 8 NaN NaN