How to transfer values from multiple columns to other columns using Pandas?-CodePudding

I have multiple columns in a pandas dataframe that I want to reduce from wide form to long form so that it essentially multiplies the number of rows in my dataframe by 2 and also adds a new column to indicate where each row comes from originally.

I have the following dataframe df where cols a1, b1, c1, and d1 all belong to one group:

name a1 b1 c1 d1 a2 b2 c2 d2
joe  x  y  x  y  z  e  e  f 
lily x  o  x  y  z  o  e  f  
john o  y  x  q  z  f  e  q

I want to transform it into the following final table with a new column to indicate where the values originated from

name a1 b1 c1 d1 new_col
joe  x  y  x  y  group1
lily x  o  x  y  group1
john o  y  x  q  group1
joe  z  e  e  f  group2
lily z  o  e  f  group2
john z  f  e  q  group2

I've tried using melt functions but can't seem to figure out how to do it for multiple variable pairs. For instance, I can do it for 2 columns but not all 8:

import pandas as pd
pd.melt(df, id_vars = 'name', var_name = 'a_var', value_vars = ['a1', 'a2'])

which results in

name a_var value
joe  a1    x  
lily a1    x   
john a1    o
joe  a2    z  
lily a2    z   
john a2    z

CodePudding user response：

Use wide_to_long and create new columns with group:

df = (pd.wide_to_long(df.reset_index(), 
                      stubnames=['a','b','c','d'], i=['index','name'], j='new_col')
        .droplevel(0)
        .reset_index())
df['new_col'] = 'group'   df['new_col'].astype(str)
print (df)
   name new_col  a  b  c  d
0   joe  group1  x  y  x  y
1   joe  group2  z  e  e  f
2  lily  group1  x  o  x  y
3  lily  group2  z  o  e  f
4  john  group1  o  y  x  q
5  john  group2  z  f  e  q

EDIT:

print (df)
   name var1_c var2_c var3_c var4_c var1_t var2_t var3_t var4_t
0   joe      x      y      x      y      z      e      e      f
1  lily      x      o      x      y      z      o      e      f
2  john      o      y      x      q      z      f      e      q

df = (pd.wide_to_long(df.reset_index(), 
                      stubnames=['var1','var2','var3','var4'], 
                      i=['index','name'], 
                      j='new_col', 
                      suffix='\w ',
                      sep='_')
        .droplevel(0)
        .reset_index())
df['new_col'] = 'group'   df['new_col'].astype(str)
print (df)
   name new_col var1 var2 var3 var4
0   joe  groupc    x    y    x    y
1   joe  groupt    z    e    e    f
2  lily  groupc    x    o    x    y
3  lily  groupt    z    o    e    f
4  john  groupc    o    y    x    q
5  john  groupt    z    f    e    q

Or:

df1 = df.set_index('name')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack().rename_axis(['name','new_col']).reset_index()
df1['new_col'] = 'group'   df1['new_col'].astype(str)
print (df1)
   name new_col var1 var2 var3 var4
0   joe  groupc    x    y    x    y
1   joe  groupt    z    e    e    f
2  lily  groupc    x    o    x    y
3  lily  groupt    z    o    e    f
4  john  groupc    o    y    x    q
5  john  groupt    z    f    e    q

CodePudding user response：

One option is with pivot_longer from pyjanitor, where for this particular use case, you pass a regular expression with groups (to names_pattern) to aggregate the desired column labels into new groups (in names_to) - in this case we wish to keep the column label, so we use .value as a placeholder to initiate that:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
    index = 'name', 
    names_to = ('.value', 'new_col'), 
    names_pattern=r"(.)(.)")
.assign(new_col = lambda df: 'group'   df.new_col)
)
   name new_col  a  b  c  d
0   joe  group1  x  y  x  y
1  lily  group1  x  o  x  y
2  john  group1  o  y  x  q
3   joe  group2  z  e  e  f
4  lily  group2  z  o  e  f
5  john  group2  z  f  e  q