I have a dataframe as follows:
col1 col2 A B C D E F
timestamp
2022-05-09 11:28:00 -12 -11 15 45 NaN NaN 3.0 100
2022-05-09 11:28:01 -8 -7 5.0 20 3.0 25 NaN NaN
2022-05-09 11:28:02 -9 -8 NaN NaN 5.0 35 15 20
2022-05-09 11:28:03 -12 -7 NaN NaN 7.0 30 NaN NaN
I want the dataframe to be restructured according to the values present in columns A,C and E. For eg:
col1 col2 3.0 5.0 7.0 15.0
timestamp
2022-05-09 11:28:00 -12 -11 100 NaN NaN 45
2022-05-09 11:28:01 -8 -7 25 20 NaN NaN
2022-05-09 11:28:02 -9 -8 NaN 35 NaN 20
2022-05-09 11:28:03 -12 -7 NaN NaN 30 NaN
So I did the following which gave me the above result:
out = pd.lreshape(
df.reset_index(),
{'cols': ['A', 'C', 'E'], 'values': ['B', 'D', 'F']}
).pivot('timestamp', 'cols', 'values')
But I would like to have the new column names like the table shown below without changing the column names of the existing columns
col1 col2 A_3.0 A_5.0 A_7.0 A_15.0
timestamp
2022-05-09 11:28:00 -12 -11 100 NaN NaN 45
2022-05-09 11:28:01 -8 -7 25 20 NaN NaN
2022-05-09 11:28:02 -9 -8 NaN 35 NaN 20
2022-05-09 11:28:03 -12 -7 NaN NaN 30 NaN
Is there an easier way to do this?
CodePudding user response:
Add column col1,col2
to pivot
and then add DataFrame.add_prefix
:
out = (pd.lreshape(
df.reset_index(),
{'cols': ['A', 'C', 'E'], 'values': ['B', 'D', 'F']}
).pivot(['timestamp','col1','col2'], 'cols', 'values')
.add_prefix('A_')
.reset_index(['col1','col2']))
print (out)
cols col1 col2 A_3.0 A_5.0 A_7.0 A_15.0
timestamp
2022-05-09 11:28:00 -12 -11 100.0 NaN NaN 45.0
2022-05-09 11:28:01 -8 -7 25.0 20.0 NaN NaN
2022-05-09 11:28:02 -9 -8 NaN 35.0 NaN 20.0
2022-05-09 11:28:03 -12 -7 NaN NaN 30.0 NaN
CodePudding user response:
IIUC, one option is a combination of pivot_longer and pivot_wider from pyjanitor:
# pip install pyjanitor
import janitor
import pandas as pd
(df
.pivot_longer(
index = 'col*',
names_to = ['num', 'A'],
names_pattern = ['A|C|E', 'B|D|F'],
ignore_index = False)
.dropna()
.reset_index()
.pivot_wider(
index=slice('timestamp', 'col2'),
names_from = 'num')
.set_index('timestamp')
)
col1 col2 A_3.0 A_5.0 A_7.0 A_15.0
timestamp
2022-05-09 11:28:00 -12.0 -11 100.0 NaN NaN 45.0
2022-05-09 11:28:01 -8.0 -7 25.0 20.0 NaN NaN
2022-05-09 11:28:02 -9.0 -8 NaN 35.0 NaN 20.0
2022-05-09 11:28:03 -12.0 -7 NaN NaN 30.0 NaN