Is there a way to return a pandas dataframe with a modified column?-CodePudding

Say I have a dataframe df with column "age". Say "age" has some NaN values and I want to create two new dataframes, dfMean and dfMedian, which fill in the NaN values differently. This is the way I would do it:

# Step 1:
dfMean = df
dfMean["age"].fillna(df["age"].mean(),inplace=True)
# Step 2:
dfMedian= df
dfMedian["age"].fillna(df["age"].median(),inplace=True)

I'm curious whether there's a way to do each of these steps in one line instead of two, by returning the modified dataframe without needing to copy the original. But I haven't been able to find anything so far. Thanks, and let me know if I can clarify or if you have a better title in mind for the question :)

CodePudding user response：

Doing dfMean = dfMean["age"].fillna(df["age"].mean()) you create a Series, not a DataFrame.

To add two new Series (=columns) to your DataFrame, use:

df2 = df.assign(age_fill_mean=df["age"].fillna(df["age"].mean()),
                age_fill_median=df["age"].fillna(df["age"].median()),
                )

CodePudding user response：

You alternatively can use alias Pandas.DataFrame.agg()

"Aggregate using one or more operations over the specified axis."

df.agg({'age' : ['mean', 'median']})

CodePudding user response：

No, need 2 times defined new 2 DataFrames by DataFrame.fillna with dictionary for specify columns names for replacement missing values:

dfMean = df.fillna({'age': df["age"].mean()})
dfMedian = df.fillna({'age': df["age"].median()})

One line is:

dfMean,dfMedian=df.fillna({'age': df["age"].mean()}), df.fillna({'age': df["age"].median()})