input df:
A Jan.S1 Jan.S2 Feb.S1 Feb.S2
x 1 2 3 4
y 6 7 8 9
output df:
A month S1 S2
x Jan 1 2
x Feb 3 4
y Jan 6 7
y Feb 8 9
How can I make input become output format?
CodePudding user response:
If you had a numeric value only after the '.' you could use pandas.wide_to_long
.
As this is not the case, you can use a manual reshaping with a MultiIndex and stack
:
out = (df
.set_index('A')
.set_axis(df.columns.drop('A').str.split('.', expand=True), axis=1)
.stack(0).rename_axis(['A', 'month'])
.reset_index()
)
output:
A month S1 S2
0 x Feb 3 4
1 x Jan 1 2
2 y Feb 8 9
3 y Jan 6 7
CodePudding user response:
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index = 'A', names_to = ('month', '.value'), names_sep = '.')
A month S1 S2
0 x Jan 1 2
1 y Jan 6 7
2 x Feb 3 4
3 y Feb 8 9
In the above solution, the .value
determines which parts of the column labels remain as headers - the labels are split apart with the names_sep
.
You can pull this off with pd.wide_to_long
by renaming the columns (usually, depending on the data size, pd.wide_to_long
can be slower than pivot_longer
):
temp = df.set_index('A')
#flip the sub-labels positions
temp.columns = temp.columns.str.split('.').str[::-1].str.join('.')
(pd.wide_to_long(
temp.reset_index(),
stubnames=['S1','S2'],
i = 'A',
j = 'month',
sep = '.',
suffix = '. ')
.reset_index()
)
A month S1 S2
0 x Jan 1 2
1 y Jan 6 7
2 x Feb 3 4
3 y Feb 8 9