I have a dataframe where I would like to pivot my data to fit a specific format, making sure the dates are consecutive.
Data
ID Q122_c_en Q122con_s Q222_c_en Q222con_s Q322_c_en Q322con_s Q422_c_en Q422con_s
AA 900 89 1000 90 1200 92 1000 90
BB 1000 10 1000 20 1100 25 1300 30
Desired
ID Date con_en con_s
AA Q122 900 89
AA Q222 1000 90
AA Q322 1200 92
AA Q422 1000 90
BB Q122 1000 10
BB Q222 1000 20
BB Q322 1100 25
BB Q422 1300 30
Doing
df.pivot(index="ID", columns="Date", values=["con_en", "con_s"])
I am using pivot, however, the format does not reflect the desired above format. Any suggestion is appreciated.
CodePudding user response:
You can create a MultiIndex and reshape:
(df
.set_index('ID')
.pipe(lambda d: d.set_axis(pd.MultiIndex.from_arrays(zip(*d.columns.str.split(
r'(?<=Q\d{3})_?', 1))), axis=1))
.stack(0)
.reset_index()
.rename(columns={'level_1': 'Date'})
)
This should give you the desired order, but in case of issues use: .sort_values(by=['ID', 'Date'])
output:
ID Date c_en con_s
0 AA Q122 900 89
1 AA Q222 1000 90
2 AA Q322 1200 92
3 AA Q422 1000 90
4 BB Q122 1000 10
5 BB Q222 1000 20
6 BB Q322 1100 25
7 BB Q422 1300 30
CodePudding user response:
Here is one way to do that
Melt tthe DF, then split the column name, and finally pivot it
df2=df.melt(id_vars='ID', var_name = 'level1')
df2[['Date', 'level3']] = df2['level1'].str.extract(r'(^.{4})(.*)', expand=True)
df2.pivot(index=['ID', 'Date'], columns=['level3'], values='value').reset_index()
level3 ID Date _c_en con_s
0 AA Q122 900 89
1 AA Q222 1000 90
2 AA Q322 1200 92
3 AA Q422 1000 90
4 BB Q122 1000 10
5 BB Q222 1000 20
6 BB Q322 1100 25
7 BB Q422 1300 30
CodePudding user response:
You can start with melt
then extract
and pivot
:
dfm = df.melt(id_vars=['ID'])
df_out = pd.concat([dfm, dfm['variable'].str.extract('^(\w\d\d\d)(.*)')], axis=1)[['ID','value',0,1]] \
.pivot(index=['ID',0], columns=[1]).reset_index()
df_out.columns = ['ID','Date','con_en','con_s']
print(df_out)
ID Date con_en con_s
0 AA Q122 900 89
1 AA Q222 1000 90
2 AA Q322 1200 92
3 AA Q422 1000 90
4 BB Q122 1000 10
5 BB Q222 1000 20
6 BB Q322 1100 25
7 BB Q422 1300 30
CodePudding user response:
one option, where you can do the transform in one step, is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(
index = 'ID',
names_to = ('Date', '.value'),
names_pattern = r"(Q\d )_?(. )",
sort_by_appearance = True)
ID Date c_en con_s
0 AA Q122 900 89
1 AA Q222 1000 90
2 AA Q322 1200 92
3 AA Q422 1000 90
4 BB Q122 1000 10
5 BB Q222 1000 20
6 BB Q322 1100 25
7 BB Q422 1300 30
Any label in the columns associated with .value
stays as a header; this is determined by the groups in the regex in names_pattern
.