Restructuring the Pandas dataframes-CodePudding

I have a dataframe as follows:

                     A          B       C      D      E       F
timestamp       
2022-05-09 11:28:00  15         45      NaN    NaN    3.0    100
2022-05-09 11:28:01  5.0        20      3.0    25     NaN    NaN
2022-05-09 11:28:02  NaN        NaN     5.0    35     15     20
2022-05-09 11:28:03  NaN        NaN     7.0    30     NaN    NaN

I want the dataframe to be restructure according to the values present in columns A,C and E. For eg:

                     3.0      5.0    7.0     15.0
timestamp       
2022-05-09 11:28:00  100      NaN    NaN      45
2022-05-09 11:28:01  25       20     NaN      NaN
2022-05-09 11:28:02  NaN      35     NaN      20
2022-05-09 11:28:03  NaN      NaN    30       NaN

Should I group the columns A,C and E, etract the unique values and sort the values in columns B,D and F accordingly? Or if there's an easier method, do let me know.

Thanks in advance!

CodePudding user response：

Try:

#reset_index if timestamp is the index instead of a column
df = df.reset_index()

#rename columns for compatibility with wide_to_long
df.columns = ["timestamp", "Col1", "Val1", "Col2", "Val2", "Col3", "Val3"]

#change from wide to long format
longdf = pd.wide_to_long(df, ["Col","Val"], i="timestamp", j="id").reset_index()

#pivot to get the required output
output = longdf.pivot_table("Val","timestamp","Col").rename_axis(None, axis=1)

>>> output
                      3.0   5.0   7.0   15.0
timestamp                                   
2022-05-09 11:28:00  100.0   NaN   NaN  45.0
2022-05-09 11:28:01   25.0  20.0   NaN   NaN
2022-05-09 11:28:02    NaN  35.0   NaN  20.0
2022-05-09 11:28:03    NaN   NaN  30.0   NaN

CodePudding user response：

One option is to flip from wide to long with pivot_longer from pyjanitor, and then flip to wide with pivot :

# pip install pyjanitor
import pandas as pd

(
df
.pivot_longer(
    names_to = ('headers', 'values'), 
    names_pattern = ('A|C|E', 'B|D|F'), 
    ignore_index = False)
.dropna()
.pivot(columns='headers')
['values']
.rename_axis(columns = None)
)

                      3.0   5.0   7.0   15.0
timestamp
2022-05-09 11:28:00  100.0   NaN   NaN  45.0
2022-05-09 11:28:01   25.0  20.0   NaN   NaN
2022-05-09 11:28:02    NaN  35.0   NaN  20.0
2022-05-09 11:28:03    NaN   NaN  30.0   NaN

Note that for pivot to work, the index should be unique; if you are not sure of the uniqueness of the index, you can use pivot_table instead.

CodePudding user response：

You can use pd.lreshape with .pivot to reshape your data. lreshape allows you to perform a stacking like operation to concatenate multiple columns into multiple rows.

From there, we can pivot these new columns to the desired shape.

out = pd.lreshape(
        df.reset_index(), 
        {'cols': ['A', 'C', 'E'], 'values': ['B', 'D', 'F']}
    ).pivot('timestamp', 'cols', 'values')

print(out)
cols                  3.0   5.0   7.0   15.0
timestamp                                   
2022-05-09 11:28:00  100.0   NaN   NaN  45.0
2022-05-09 11:28:01   25.0  20.0   NaN   NaN
2022-05-09 11:28:02    NaN  35.0   NaN  20.0
2022-05-09 11:28:03    NaN   NaN  30.0   NaN