Say I have a dataframe df
with column "age"
.
Say "age"
has some NaN
values and I want to create two new dataframes, dfMean
and dfMedian
, which fill in the NaN
values differently.
This is the way I would do it:
# Step 1:
dfMean = df
dfMean["age"].fillna(df["age"].mean(),inplace=True)
# Step 2:
dfMedian= df
dfMedian["age"].fillna(df["age"].median(),inplace=True)
I'm curious whether there's a way to do each of these steps in one line instead of two, by returning the modified dataframe without needing to copy the original. But I haven't been able to find anything so far. Thanks, and let me know if I can clarify or if you have a better title in mind for the question :)
CodePudding user response:
Doing dfMean = dfMean["age"].fillna(df["age"].mean())
you create a Series
, not a DataFrame
.
To add two new Series
(=columns) to your DataFrame
, use:
df2 = df.assign(age_fill_mean=df["age"].fillna(df["age"].mean()),
age_fill_median=df["age"].fillna(df["age"].median()),
)
CodePudding user response:
You alternatively can use alias Pandas.DataFrame.agg()
"Aggregate using one or more operations over the specified axis."
df.agg({'age' : ['mean', 'median']})
CodePudding user response:
No, need 2 times defined new 2 DataFrames by DataFrame.fillna
with dictionary for specify columns names for replacement missing values:
dfMean = df.fillna({'age': df["age"].mean()})
dfMedian = df.fillna({'age': df["age"].median()})
One line is:
dfMean,dfMedian=df.fillna({'age': df["age"].mean()}), df.fillna({'age': df["age"].median()})