Home > Enterprise >  Pandas multi index/groupby
Pandas multi index/groupby

Time:08-02

I have a data frame containing several registers from sold products, revenues, prices and dates

DATA        CLIENT     PRODUCT   PRICE
2020-08-28  xxxxxxx    RIOT      20.0

I am to group my information by year/month and product. I am running a group by to_period that extract the exactly information :

dfgmv = dfgift[['PRODUCT','PRICE']].groupby([dfgift.DATA.dt.to_period("M"), 'PRODUCT']).agg(['count','sum'])

This is the output :

                        PRICE
                        count   sum 
DATA       PRODUCT 
2020-08    RIOT         2       40.00

The question is that, as I export to excel, the date column is not interpreted as da date (yyyy-mm). I am trying to convert the yyyy-mm to something like yyyy-mm-dd so Excel understand it.

I´ve read several questions about multi index but my knowledge wasn't enough to use that info to apply here. I tried to change my index to datetime, but, as I run it, I lost the second index column (product).

dfgmv.index = pd.to_datetime(dfgmv.index.get_level_values(0).astype('datetime64[ns]'))

.

             VALOR  
             count  sum
DATA                
2020-08-01   2      40.00   

So, How can I change the information format without losing my index?

CodePudding user response:

Index.set_levels is designed to allow for the setting of specific index level(s).

dfgmv.index = (
    dfgmv.index.set_levels(dfgmv.index.levels[0].astype('datetime64[ns]'), level=0)
)

Result

                   PRICE      
                   count   sum
DATA       PRODUCT            
2020-08-01 RIOT        1  20.0

CodePudding user response:

You can change your groupby to include start of month for each date you have:

dfgmv = dfgift[['PRODUCT','PRICE']].groupby([dfgift.DATA.astype('datetime64[M]'), 'PRODUCT']).agg(['count','sum'])
dfgmv

                      PRICE
                      count     sum
DATA        PRODUCT         
2020-08-01  RIOT      1         20.0
  • Related