I have a dataframe as follows:
A B C D E F
timestamp
2022-05-09 11:28:00 15 45 NaN NaN 3.0 100
2022-05-09 11:28:01 5.0 20 3.0 25 NaN NaN
2022-05-09 11:28:02 NaN NaN 5.0 35 15 20
2022-05-09 11:28:03 NaN NaN 7.0 30 NaN NaN
I want the dataframe to be restructure according to the values present in columns A,C and E. For eg:
3.0 5.0 7.0 15.0
timestamp
2022-05-09 11:28:00 100 NaN NaN 45
2022-05-09 11:28:01 25 20 NaN NaN
2022-05-09 11:28:02 NaN 35 NaN 20
2022-05-09 11:28:03 NaN NaN 30 NaN
Should I group the columns A,C and E, etract the unique values and sort the values in columns B,D and F accordingly? Or if there's an easier method, do let me know.
Thanks in advance!
CodePudding user response:
Try:
#reset_index if timestamp is the index instead of a column
df = df.reset_index()
#rename columns for compatibility with wide_to_long
df.columns = ["timestamp", "Col1", "Val1", "Col2", "Val2", "Col3", "Val3"]
#change from wide to long format
longdf = pd.wide_to_long(df, ["Col","Val"], i="timestamp", j="id").reset_index()
#pivot to get the required output
output = longdf.pivot_table("Val","timestamp","Col").rename_axis(None, axis=1)
>>> output
3.0 5.0 7.0 15.0
timestamp
2022-05-09 11:28:00 100.0 NaN NaN 45.0
2022-05-09 11:28:01 25.0 20.0 NaN NaN
2022-05-09 11:28:02 NaN 35.0 NaN 20.0
2022-05-09 11:28:03 NaN NaN 30.0 NaN
CodePudding user response:
One option is to flip from wide to long with pivot_longer from pyjanitor, and then flip to wide with pivot
:
# pip install pyjanitor
import pandas as pd
(
df
.pivot_longer(
names_to = ('headers', 'values'),
names_pattern = ('A|C|E', 'B|D|F'),
ignore_index = False)
.dropna()
.pivot(columns='headers')
['values']
.rename_axis(columns = None)
)
3.0 5.0 7.0 15.0
timestamp
2022-05-09 11:28:00 100.0 NaN NaN 45.0
2022-05-09 11:28:01 25.0 20.0 NaN NaN
2022-05-09 11:28:02 NaN 35.0 NaN 20.0
2022-05-09 11:28:03 NaN NaN 30.0 NaN
Note that for pivot
to work, the index should be unique; if you are not sure of the uniqueness of the index, you can use pivot_table
instead.
CodePudding user response:
You can use pd.lreshape
with .pivot
to reshape your data. lreshape
allows you to perform a stacking like operation to concatenate multiple columns into multiple rows.
From there, we can pivot these new columns to the desired shape.
out = pd.lreshape(
df.reset_index(),
{'cols': ['A', 'C', 'E'], 'values': ['B', 'D', 'F']}
).pivot('timestamp', 'cols', 'values')
print(out)
cols 3.0 5.0 7.0 15.0
timestamp
2022-05-09 11:28:00 100.0 NaN NaN 45.0
2022-05-09 11:28:01 25.0 20.0 NaN NaN
2022-05-09 11:28:02 NaN 35.0 NaN 20.0
2022-05-09 11:28:03 NaN NaN 30.0 NaN