Home > Blockchain >  Restructing and Renaming the few columns in pandas dataframe
Restructing and Renaming the few columns in pandas dataframe

Time:08-30

I have a dataframe as follows:

                      col1  col2   A        B       C      D      E       F
timestamp       
2022-05-09 11:28:00   -12    -11   15       45      NaN    NaN    3.0    100
2022-05-09 11:28:01    -8    -7    5.0      20     3.0     25     NaN    NaN
2022-05-09 11:28:02    -9    -8    NaN       NaN    5.0     35     15     20
2022-05-09 11:28:03    -12   -7    NaN      NaN     7.0    30     NaN    NaN            

I want the dataframe to be restructured according to the values present in columns A,C and E. For eg:

                      col1   col2     3.0    5.0    7.0     15.0
timestamp       
2022-05-09 11:28:00    -12    -11     100    NaN    NaN      45
2022-05-09 11:28:01     -8    -7      25     20     NaN      NaN
2022-05-09 11:28:02     -9    -8     NaN     35     NaN      20
2022-05-09 11:28:03    -12   -7      NaN     NaN    30       NaN            

So I did the following which gave me the above result:

out = pd.lreshape(
        df.reset_index(), 
        {'cols': ['A', 'C', 'E'], 'values': ['B', 'D', 'F']}
    ).pivot('timestamp', 'cols', 'values')

But I would like to have the new column names like the table shown below without changing the column names of the existing columns


                      col1   col2     A_3.0    A_5.0    A_7.0     A_15.0
timestamp       
2022-05-09 11:28:00    -12    -11     100       NaN     NaN        45
2022-05-09 11:28:01     -8    -7       25       20      NaN        NaN
2022-05-09 11:28:02     -9    -8       NaN      35      NaN        20
2022-05-09 11:28:03    -12   -7       NaN       NaN      30       NaN           

Is there an easier way to do this?

CodePudding user response:

Add column col1,col2 to pivot and then add DataFrame.add_prefix:

out = (pd.lreshape(
        df.reset_index(), 
        {'cols': ['A', 'C', 'E'], 'values': ['B', 'D', 'F']}
    ).pivot(['timestamp','col1','col2'], 'cols', 'values')
     .add_prefix('A_')
     .reset_index(['col1','col2']))

print (out)
cols                 col1  col2  A_3.0  A_5.0  A_7.0  A_15.0
timestamp                                                   
2022-05-09 11:28:00   -12   -11  100.0    NaN    NaN    45.0
2022-05-09 11:28:01    -8    -7   25.0   20.0    NaN     NaN
2022-05-09 11:28:02    -9    -8    NaN   35.0    NaN    20.0
2022-05-09 11:28:03   -12    -7    NaN    NaN   30.0     NaN

CodePudding user response:

IIUC, one option is a combination of pivot_longer and pivot_wider from pyjanitor:

# pip install pyjanitor

import janitor 
import pandas as pd

(df
.pivot_longer(
    index = 'col*', 
    names_to = ['num', 'A'], 
    names_pattern = ['A|C|E', 'B|D|F'], 
    ignore_index = False)
.dropna()
.reset_index()
.pivot_wider(
    index=slice('timestamp', 'col2'), 
    names_from = 'num')
.set_index('timestamp')
)
                     col1  col2  A_3.0  A_5.0  A_7.0  A_15.0
timestamp
2022-05-09 11:28:00 -12.0   -11  100.0    NaN    NaN    45.0
2022-05-09 11:28:01  -8.0    -7   25.0   20.0    NaN     NaN
2022-05-09 11:28:02  -9.0    -8    NaN   35.0    NaN    20.0
2022-05-09 11:28:03 -12.0    -7    NaN    NaN   30.0     NaN
  • Related