Home > Mobile >  make a new column from other column's string
make a new column from other column's string

Time:07-26

input df:

A Jan.S1 Jan.S2 Feb.S1 Feb.S2
x 1      2      3      4
y 6      7      8      9

output df:

A  month S1 S2 
x  Jan   1  2
x  Feb   3  4
y  Jan   6  7
y  Feb   8  9

How can I make input become output format?

CodePudding user response:

If you had a numeric value only after the '.' you could use pandas.wide_to_long.

As this is not the case, you can use a manual reshaping with a MultiIndex and stack:

out = (df
 .set_index('A')
 .set_axis(df.columns.drop('A').str.split('.', expand=True), axis=1)
 .stack(0).rename_axis(['A', 'month'])
 .reset_index()
)

output:

   A month  S1  S2
0  x   Feb   3   4
1  x   Jan   1   2
2  y   Feb   8   9
3  y   Jan   6   7

CodePudding user response:

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(index = 'A', names_to = ('month', '.value'), names_sep = '.')

   A month  S1  S2
0  x   Jan   1   2
1  y   Jan   6   7
2  x   Feb   3   4
3  y   Feb   8   9

In the above solution, the .value determines which parts of the column labels remain as headers - the labels are split apart with the names_sep.

You can pull this off with pd.wide_to_long by renaming the columns (usually, depending on the data size, pd.wide_to_long can be slower than pivot_longer):

temp = df.set_index('A')
#flip the sub-labels positions
temp.columns = temp.columns.str.split('.').str[::-1].str.join('.')
(pd.wide_to_long(
        temp.reset_index(), 
        stubnames=['S1','S2'], 
        i = 'A', 
        j = 'month', 
        sep = '.', 
        suffix = '. ')
.reset_index()
)

   A month  S1  S2
0  x   Jan   1   2
1  y   Jan   6   7
2  x   Feb   3   4
3  y   Feb   8   9
  • Related