Home > OS >  Python Pivoting dataframe that has mulitple ID columns
Python Pivoting dataframe that has mulitple ID columns

Time:04-07

from a database I get the following table into a python dataframe df:

FunctionID FunctionText FunctionModule UserGroup
1 Fct1 ModX GroupA
2 Fct2 ModX GroupA
2 Fct2 ModX GroupB
3 Fct3 ModY GroupB
3 Fct3 ModY GroupC
. ... ... ...
3000 Fct3000 ModZ GroupF

My goal is to get a pivot-like table that looks like this one:

FunctionID FunctionText FunctionModule GroupA GroupB GroupC ... GroupF
1 Fct1 ModX X ...
2 Fct2 ModX X X ...
3 Fct3 ModY X X ...
. ... ... ... ... ... ... ...
3000 Fct3000 ModZ ... X

So, the first 3 columns shall stay as is whereas the entries of UserGroup column shall be the added columns of the pivot table.

I tried some approaches like

df2 = df.pivot_table(index=df.iloc[:3],columns='UserGroup',values='UserGroup')

but without success. Maybe I have to use pivot() or stack() or unstack() ? But none of those led me to the desired target table.

CodePudding user response:

IIUC, you could add an extra column and pivot_table with aggfunc='first':

(df
 .assign(value='x') # or value=df['UserGroup'] if you want the names as value
 .pivot_table(index=list(df.columns[:3]),
              columns='UserGroup',
              values='value',
              aggfunc='first',
              fill_value='',
              )
)

output:

UserGroup                              GroupA GroupB GroupC
FunctionID FunctionText FunctionModule                     
1          Fct1         ModX                x              
2          Fct2         ModX                x      x       
3          Fct3         ModY                       x      x

NB. note that index=df.iloc[:3] will also select 'UserGroup', I used index=list(df.columns[:3]) here

  • Related