I have a function that receives a dataframe and returns a new dataframe, which is the same but with some added columns. Just as an example:
def arbitrary_function_that_adds_columns(df):
# In this trivial example I am adding only 1 column, but this function may add an arbitrary number of columns.
df['new column'] = df['A'] df['B'] / 8 df['A']**3
return df
To apply this function to a whole data frame is easy:
import pandas
df = pandas.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
df = arbitrary_function_that_adds_columns(df)
print(df)
How do I apply the arbitrary_function_that_adds_columns
function to a subset of the rows? I am trying this
import pandas
df = pandas.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
rows = df['A'].isin({1,3})
df.loc[rows] = arbitrary_function_that_adds_columns(df.loc[rows])
print(df)
but I receive the original dataframe. The result I'm expecting to get is
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625
CodePudding user response:
With the example you've given:
df['A B'] = df.loc[df['A'].isin({1,3})].sum(axis=1)
or
df['A B'] = np.nan
df.loc[df['A'].isin({1,3}),['A B']] = sum_AB(df)
More generally:
df.loc[ [row mask], [column mask] ] = [returned df of same shape]
#optionally, use fillna/bfill/ffill as appropriate
For more complicated stuff, take a look at DataFrame.transform
and DataFrame.apply
; combining those with df.loc
and an appropriate boolean mask will accomplish what you need.
CodePudding user response:
Note that, according to the expected output, you want rows=[1,3]
, not rows = df['A'].isin({1,3})
. The latter selects all the rows whose 'A' value is either 1 or 3.
import pandas as pd
def arbitrary_function_that_adds_columns(df):
# make sure that the function doesn't mutate the original DataFrame
# Otherwise, you will get a SettingWithCopyWarning
df = df.copy()
df['new column'] = df['A'] df['B'] / 8 df['A']**3
return df
df = pd.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
rows = [1, 3]
# the function is applied to a copy of a DataFrame slice
>>> sub_df = arbitrary_function_that_adds_columns(df.loc[rows])
>>> sub_df
A B new column
1 2 3 10.375
3 4 5 68.625
# Add the new information to the original df
>>> df = df.combine_first(sub_df)
>>> df
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625