my data is in the following format
x = pd.DataFrame([
{'date': '2011-01-01', 'col1': '1','col2': '5', 'A_Q': '1', 'A_W': 'aa', 'B_Q': '2', 'B_W': 'zz'},
{'date': '2011-01-02', 'col1': '1','col2': '9', 'A_Q': '-1', 'A_W': 'bb', 'B_Q': '3', 'B_W': 'rr'},
{'date': '2011-01-03', 'col1': '3','col2': '3', 'A_Q': '0', 'A_W': 'cc', 'B_Q': '4', 'B_W': 'vv'},
{'date': '2011-02-04', 'col1': '4','col2': '1', 'A_Q': '3', 'A_W': 'dd', 'B_Q': '5', 'B_W': 'gg'},
])
date col1 col2 A_Q A_W B_Q B_W
0 2011-01-01 1 5 1 aa 2 zz
1 2011-01-02 1 9 -1 bb 3 rr
2 2011-01-03 3 3 0 cc 4 vv
3 2011-02-04 4 1 3 dd 5 gg
I would like to reshape the dataframe using melt or similar functions, with two output value columns. Any ideas on how to do this without splitting the input array?
date col1 col2 VAR Q W
0 2011-01-01 1 5 A 1 aa
1 2011-01-01 1 5 B 2 zz
2 2011-01-02 1 9 A -1 bb
3 2011-01-02 1 9 B 3 rr
4 2011-01-03 3 3 A 0 cc
5 2011-01-03 3 3 B 4 vv
6 2011-01-04 4 1 A 3 dd
7 2011-01-04 4 1 B 5 gg
CodePudding user response:
First idea is create MultiIndex
by split columns with _
and reshape by DataFrame.stack
:
df = x.set_index(['date','col1', 'col2'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).reset_index().rename(columns={'level_3':'VAR'})
print (df)
date col1 col2 VAR Q W
0 2011-01-01 1 5 A 1 aa
1 2011-01-01 1 5 B 2 zz
2 2011-01-02 1 9 A -1 bb
3 2011-01-02 1 9 B 3 rr
4 2011-01-03 3 3 A 0 cc
5 2011-01-03 3 3 B 4 vv
6 2011-02-04 4 1 A 3 dd
7 2011-02-04 4 1 B 5 gg
Or use wide_to_long
with rshape stack
and Series.unstack
:
df = (pd.wide_to_long(x,stubnames=["A","B"],
i=['date','col1', 'col2'],
j="new", sep="_",suffix=".*")
.stack()
.unstack(-2)
.reset_index()
.rename(columns={'level_3':'VAR'}))
print (df)
new date col1 col2 VAR Q W
0 2011-01-01 1 5 A 1 aa
1 2011-01-01 1 5 B 2 zz
2 2011-01-02 1 9 A -1 bb
3 2011-01-02 1 9 B 3 rr
4 2011-01-03 3 3 A 0 cc
5 2011-01-03 3 3 B 4 vv
6 2011-02-04 4 1 A 3 dd
7 2011-02-04 4 1 B 5 gg
Or swap values with _
between first and last to last and first, so need only wide_to_long
:
df1 = x.copy()
df1.columns = df1.columns.str.replace(r'(\w )_(\w )', r'\2_\1', regex=True)
#thank you sammywemmy for alternative
df1.columns = df1.columns.str.split('_').str[::-1].str.join('_')
df1 = pd.wide_to_long(df1,stubnames=["Q","W"],
i=['date','col1', 'col2'],
j="VAR", sep="_",suffix=".*").reset_index()
print (df1)
date col1 col2 VAR Q W
0 2011-01-01 1 5 A 1 aa
1 2011-01-01 1 5 B 2 zz
2 2011-01-02 1 9 A -1 bb
3 2011-01-02 1 9 B 3 rr
4 2011-01-03 3 3 A 0 cc
5 2011-01-03 3 3 B 4 vv
6 2011-02-04 4 1 A 3 dd
7 2011-02-04 4 1 B 5 gg
CodePudding user response:
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
x.pivot_longer(index = ['date', 'col1', 'col2'],
names_to = ('VAR', '.value'),
names_sep='_',
sort_by_appearance=True)
date col1 col2 VAR Q W
0 2011-01-01 1 5 A 1 aa
1 2011-01-01 1 5 B 2 zz
2 2011-01-02 1 9 A -1 bb
3 2011-01-02 1 9 B 3 rr
4 2011-01-03 3 3 A 0 cc
5 2011-01-03 3 3 B 4 vv
6 2011-02-04 4 1 A 3 dd
7 2011-02-04 4 1 B 5 gg