Home > Blockchain >  Python renaming an aggregated column
Python renaming an aggregated column

Time:11-13

This should be easy but for some reason, I don't understand why it won't work. I have created a new dataframe by aggregating column data but I can't rename the new columns.

This is the code I used to generate columns

df10pctSum = df10pct.groupby('table_name').agg({'missing_index_impact':['sum','count']}).reset_index()

These are my two attempts to try and rename the sum column both attempts don't fail but neither of them rename the column

df10pctSum = df10pctSum.rename(columns={'(missing_index_impact, sum)':'MissingSum'})
df10pctSum = df10pctSum.rename(columns={'missing_index_impact, sum':'MissingSum'})

A df10pctSum.info() shows this

 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   (table_name, )                 37 non-null     object 
 1   (missing_index_impact, sum)    37 non-null     float64
 2   (missing_index_impact, count)  37 non-null     int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 1016.0  bytes 

I would like to rename all of them but I can't even rename one. Any thoughts?

CodePudding user response:

We can pass tuples or use pd.NamedAgg to give custom names to aggregated columns.

  • From Named Aggregation Docs(emphasis mine):

    The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are.

df10pct.groupby('table_name').agg(
    Missingsum=("missing_index_impact", "sum"),
    Missingcount=("missing_index_impact", "count"),
)

# or

df10pct.groupby('table_name').agg(
    Missingsum=pd.NamedAgg("missing_index_impact", "sum"),
    Missingcount=pd.NamedAgg("missing_index_impact", "count"),
)

You used

_.agg({'missing_index_impact':['sum','count']})

but this would give a MultiIndex column. Hence, trying to rename using

_.rename(columns={'(missing_index_impact, sum)':'MissingSum'})

wouldn't work.

  • Related