Home > Back-end >  Pandas apply on a column returning values for two columns on a portion of df
Pandas apply on a column returning values for two columns on a portion of df

Time:07-17

I have a dataframe df and I want to apply a function to a column of the dataframe (c3) such that it returns values for two other columns (c1 and c2), and this should be done only on a portion of the df rows.

I would have expected this to work:

df.loc[0:20, ['c1','c2']] = df.loc[0:20, 'c3'].progress_apply(my_f)

where my_f is the function to be applied, returning a Series with the two values for c1 and c2,

but it doesn't, the values for c1 and c2 remains NaN in df after executing this, despite no error or warning is raised.

What is the correct way to do this? It should be trivial, but I'm struggling to find it

For instance with the following code:

df_test = pd.DataFrame([{'C3':3}, {'C3': 5}, {'C3': 8}])

def my_f(s):
  return pd.Series(['V1', 'V2'])

df_test.loc[0:1, ['C1', 'C2']] = df_test.loc[0:1, 'C3'].progress_apply(my_f)

the result is the following:

    C3  C1  C2
0   3   NaN NaN
1   5   NaN NaN
2   8   NaN NaN

while I expect the following:

    C3  C1  C2
0   3   'V1'  'V2'
1   5   'V1'  'V2'
2   8   NaN  NaN

CodePudding user response:

Try without .loc like below:

df_test = pd.DataFrame([{'C3':3}, {'C3': 5}, {'C3': 8}])

def my_f(s):
    return pd.Series(['V1', 'V2'])

df_test[['C1', 'C2']] = df_test[0:2]['C3'].apply(my_f)
print(df_test)

   C3   C1   C2
0   3   V1   V2
1   5   V1   V2
2   8  NaN  NaN
  • Related