Home > other >  Convert pandas df from long to wide based on one column as variable and two columns for values
Convert pandas df from long to wide based on one column as variable and two columns for values

Time:03-21

I have a dataframe

{'Author': {0: 1, 1: 1, 2: 2, 3: 2},
'Article': {0: 11, 1: 11, 2: 22, 3: 22},
'Year': {0: 2017, 1: 2018, 2: 2017, 3: 2018},
'First': {0: 1, 1: 0, 2: 0, 3: 0},
'Second': {0: 0, 1: 1, 2: 1, 3: 1}}

Want to convert from long to wide for Year, creating values columns based on First and Second.

Expected output

Author  Article Year    First   Second  First_2017  First_2018  Second_2017 Second_2018
1        11     2017      1      0         1          0             0          1
1        12     2018      0      1         1          0             0          1
2        22     2017      0      0         0          0             0          1
2        23      2018     0      1         0          0             0          1

CodePudding user response:

IUUC, you could pivot and merge:

df2 = df.pivot(index=['Author', 'Article'], columns='Year')
df2.columns = df2.columns.map(lambda x: '_'.join(map(str, x)))
df.merge(df2, left_on=['Author', 'Article'], right_index=True)

output:

   Author  Article  Year  First  Second  First_2017  First_2018  Second_2017  Second_2018
0       1       11  2017      1       0           1           0            0            1
1       1       11  2018      0       1           1           0            0            1
2       2       22  2017      0       1           0           0            1            1
3       2       22  2018      0       1           0           0            1            1

CodePudding user response:

If need test if exist at least one 1 in columns ['First','Second'] use DataFrame.pivot_table with any, flatten MultiIndex and append to original:

df1 = df.pivot_table(index='Author', 
                     columns='Year', 
                     values=['First','Second'], 
                     aggfunc='any')
df1.columns = [f'{a}_{b}' for a, b in df1.columns]

df = df.join(df1.astype(int), on='Author')
print (df)
   Author  Article  Year  First  Second  First_2017  First_2018  Second_2017  \
0       1       11  2017      1       0           1           0            0   
1       1       11  2018      0       1           1           0            0   
2       2       22  2017      0       1           0           0            1   
3       2       22  2018      0       1           0           0            1   

   Second_2018  
0            1  
1            1  
2            1  
3            1  
  • Related