This should be easy but for some reason, I don't understand why it won't work. I have created a new dataframe by aggregating column data but I can't rename the new columns.
This is the code I used to generate columns
df10pctSum = df10pct.groupby('table_name').agg({'missing_index_impact':['sum','count']}).reset_index()
These are my two attempts to try and rename the sum column both attempts don't fail but neither of them rename the column
df10pctSum = df10pctSum.rename(columns={'(missing_index_impact, sum)':'MissingSum'})
df10pctSum = df10pctSum.rename(columns={'missing_index_impact, sum':'MissingSum'})
A df10pctSum.info()
shows this
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (table_name, ) 37 non-null object
1 (missing_index_impact, sum) 37 non-null float64
2 (missing_index_impact, count) 37 non-null int64
dtypes: float64(1), int64(1), object(1)
memory usage: 1016.0 bytes
I would like to rename all of them but I can't even rename one. Any thoughts?
CodePudding user response:
We can pass tuples or use pd.NamedAgg
to give custom names to aggregated columns.
- From
Named Aggregation
Docs(emphasis mine):The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. pandas provides the pandas.NamedAgg namedtuple with the fields
['column', 'aggfunc']
to make it clearer what the arguments are.
df10pct.groupby('table_name').agg(
Missingsum=("missing_index_impact", "sum"),
Missingcount=("missing_index_impact", "count"),
)
# or
df10pct.groupby('table_name').agg(
Missingsum=pd.NamedAgg("missing_index_impact", "sum"),
Missingcount=pd.NamedAgg("missing_index_impact", "count"),
)
You used
_.agg({'missing_index_impact':['sum','count']})
but this would give a MultiIndex
column. Hence, trying to rename using
_.rename(columns={'(missing_index_impact, sum)':'MissingSum'})
wouldn't work.