Home > Enterprise >  How to convert GroupBy object in pandas to multiindexed dataframe
How to convert GroupBy object in pandas to multiindexed dataframe

Time:06-11

I have a raw dataframe that looks like

        codcet  placa_encoded   date   time_seconds velocidade
5031 490191222  431.0      2021-03-11   70079.0      51
5032 490221211  431.0      2021-03-11   72219.0      55
7991 490361213  562.0      2021-03-11   28559.0      24
7992 490361232  562.0      2021-03-11   29102.0      29              
7993 490361221  562.0      2021-03-11   30183.0      33                 
...

Where the numbers on the far left are indexes from the original dataset. My goal is to convert this into a dataframe indexed by placa_encoded and by n, a counter within each group that then looks like

placa_encoded  n    time_seconds  velocidade            codcet    
431.0          0      70079.0           51              490191222   
431.0          1      72219.0           55              490221211   
562.0          0      28559.0           24              490361213   
562.0          1      29102.0           29              490361232
562.0          2      30183.0           33              490361221   

That is, I aim to groupby('placa_encoded') then add another column n that counts the position within each group. The row should be indexed by both placa_encoded and n. I think I can use cumcount() to do this but it's unclear to me how to add it as a column since groupby doesn't product a dataframe I can assign to. I looked at this question but it seems they use .count() to convert it to a dataframe, and I want to preserve the data instead of getting any counts. I also tried to use pd.DataFrame(gbplaca) and pd.DataFrame(gbplaca.groups) to no avail.

Thank you so much!

CodePudding user response:

I think this is what you are wanting

df['n'] = df.sort_values(['time_seconds'], ascending=True).groupby(['placa_encoded']).cumcount()
df = df.set_index(['placa_encoded', 'n'])
df

This is a multi-index with 'placa_encoded' and 'n'

CodePudding user response:

All you have to do is to set the column in df (named 'n') with the result from df.groupby('placa_encoded').cumcount():

    df.loc[:, 'n'] = df.groupby('placa_encoded').cumcount()
    print(df)
           codcet  placa_encoded        date  time_seconds  velocidade  n
     0  490191222          431.0  2021-03-11       70079.0          51  0
     1  490221211          431.0  2021-03-11       72219.0          55  1
     2  490361213          562.0  2021-03-11       28559.0          24  0
     3  490361232          562.0  2021-03-11       29102.0          29  1
     4  490361221          562.0  2021-03-11       30183.0          33  2

Then, you can set the multi-index of df to ['placa_encoded', 'n'] if you want:

    df = df.set_index(['placa_encoded', 'index'])
    print(df)
                         codcet        date  time_seconds  velocidade
     placa_encoded n                                                 
     431.0         0  490191222  2021-03-11       70079.0          51
                   1  490221211  2021-03-11       72219.0          55
     562.0         0  490361213  2021-03-11       28559.0          24
                   1  490361232  2021-03-11       29102.0          29
                   2  490361221  2021-03-11       30183.0          33
  • Related