Home > Net >  Reshape dataframe using pandas melt to get two value columns
Reshape dataframe using pandas melt to get two value columns

Time:12-14

my data is in the following format

x = pd.DataFrame([
    {'date': '2011-01-01', 'col1': '1','col2': '5', 'A_Q': '1', 'A_W': 'aa', 'B_Q': '2', 'B_W': 'zz'},
    {'date': '2011-01-02', 'col1': '1','col2': '9', 'A_Q': '-1', 'A_W': 'bb', 'B_Q': '3', 'B_W': 'rr'},
    {'date': '2011-01-03', 'col1': '3','col2': '3', 'A_Q': '0', 'A_W': 'cc', 'B_Q': '4', 'B_W': 'vv'},
    {'date': '2011-02-04', 'col1': '4','col2': '1', 'A_Q': '3', 'A_W': 'dd', 'B_Q': '5', 'B_W': 'gg'},
])
    date      col1 col2 A_Q A_W  B_Q B_W
0   2011-01-01  1    5   1   aa   2  zz
1   2011-01-02  1    9   -1  bb   3  rr
2   2011-01-03  3    3   0   cc   4  vv
3   2011-02-04  4    1   3   dd   5  gg

I would like to reshape the dataframe using melt or similar functions, with two output value columns. Any ideas on how to do this without splitting the input array?


     date     col1 col2 VAR Q   W
0   2011-01-01  1   5   A   1   aa
1   2011-01-01  1   5   B   2   zz
2   2011-01-02  1   9   A   -1  bb
3   2011-01-02  1   9   B   3   rr
4   2011-01-03  3   3   A   0   cc
5   2011-01-03  3   3   B   4   vv
6   2011-01-04  4   1   A   3   dd
7   2011-01-04  4   1   B   5   gg

CodePudding user response:

First idea is create MultiIndex by split columns with _ and reshape by DataFrame.stack:

df = x.set_index(['date','col1', 'col2'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).reset_index().rename(columns={'level_3':'VAR'})
print (df)
         date col1 col2 VAR   Q   W
0  2011-01-01    1    5   A   1  aa
1  2011-01-01    1    5   B   2  zz
2  2011-01-02    1    9   A  -1  bb
3  2011-01-02    1    9   B   3  rr
4  2011-01-03    3    3   A   0  cc
5  2011-01-03    3    3   B   4  vv
6  2011-02-04    4    1   A   3  dd
7  2011-02-04    4    1   B   5  gg

Or use wide_to_long with rshape stack and Series.unstack:

df = (pd.wide_to_long(x,stubnames=["A","B"],
                       i=['date','col1', 'col2'],
                       j="new", sep="_",suffix=".*")
        .stack()
        .unstack(-2)
        .reset_index()
        .rename(columns={'level_3':'VAR'}))
print (df)
new        date col1 col2 VAR   Q   W
0    2011-01-01    1    5   A   1  aa
1    2011-01-01    1    5   B   2  zz
2    2011-01-02    1    9   A  -1  bb
3    2011-01-02    1    9   B   3  rr
4    2011-01-03    3    3   A   0  cc
5    2011-01-03    3    3   B   4  vv
6    2011-02-04    4    1   A   3  dd
7    2011-02-04    4    1   B   5  gg

Or swap values with _ between first and last to last and first, so need only wide_to_long:

df1 = x.copy()
df1.columns = df1.columns.str.replace(r'(\w )_(\w )', r'\2_\1', regex=True)

#thank you sammywemmy for alternative
df1.columns = df1.columns.str.split('_').str[::-1].str.join('_')

df1 = pd.wide_to_long(df1,stubnames=["Q","W"],
                        i=['date','col1', 'col2'],
                        j="VAR", sep="_",suffix=".*").reset_index()
print (df1)
         date col1 col2 VAR   Q   W
0  2011-01-01    1    5   A   1  aa
1  2011-01-01    1    5   B   2  zz
2  2011-01-02    1    9   A  -1  bb
3  2011-01-02    1    9   B   3  rr
4  2011-01-03    3    3   A   0  cc
5  2011-01-03    3    3   B   4  vv
6  2011-02-04    4    1   A   3  dd
7  2011-02-04    4    1   B   5  gg

CodePudding user response:

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
x.pivot_longer(index = ['date', 'col1', 'col2'], 
               names_to = ('VAR', '.value'), 
               names_sep='_', 
               sort_by_appearance=True)
 
         date col1 col2 VAR   Q   W
0  2011-01-01    1    5   A   1  aa
1  2011-01-01    1    5   B   2  zz
2  2011-01-02    1    9   A  -1  bb
3  2011-01-02    1    9   B   3  rr
4  2011-01-03    3    3   A   0  cc
5  2011-01-03    3    3   B   4  vv
6  2011-02-04    4    1   A   3  dd
7  2011-02-04    4    1   B   5  gg
  • Related