Home > Net >  Convert pandas dataframe from wide to long by two year columns
Convert pandas dataframe from wide to long by two year columns

Time:02-22

There many answers, but still cannot resolve. I have a dataframe:

{'Author': {0: 111, 1: 222}, 'Journal17': {0: 2, 1: 4}, 'Journal18': {0: 1, 1: 7}, 'Journal19': {0: 0, 1: 3}, 'Journal20': {0: 0, 1: 0}, 'Var_one': {0: 0, 1: 2}, 'Var_two': {0: 0, 1: 2}, 'Score17': {0: 10.591, 1: 14.682}, 'Score18': {0: 4.743, 1: 79.04}, 'Score19': {0: 0.0, 1: 14.487}, 'Score20': {0: 0.0, 1: 19.078}, 'Var3': {0: 0, 1: 0}, 'Var4': {0: 0, 1: 0}, 'Var5': {0: 5, 1: 42}, 'Var6': {0: 12, 1: 44}}

I am trying to convert it from wide to long by two columns, but nothing work so far.

I want to gather Journal and Score by Year.

Expected output:

Author  Year    Journal Score   Var_one Var_two Var3    Var4    Var5    Var6
111     2017      2     10.591     0       0     0       0       5       12
111     2018      1     4.743      0       0     0       0       5       12
111     2019      0     0          0       0     0       0       5       12
111     2020      0     0          0       0     0       0       5       12
222     2017      4     14.682     2       2     0       0       42      44
222     2018      7     79.04      2       2     0       0       42      44
222     2019      3     14.487     2       2     0       0       42      44
222     2020      0     19.078     2       2     0       0       42      44

Thanks!

CodePudding user response:

Let's use pd.wide_to_long:

pd.wide_to_long(df,
                ['Journal','Score'],
                ['Author','Var_one', 'Var_two', 'Var3', 'Var4', 'Var5', 'Var6'],
                'Year',
                sep='',
                suffix='\d ').reset_index()

Output:

   Author  Var_one  Var_two  Var3  Var4  Var5  Var6  Year  Journal   Score
0     111        0        0     0     0     5    12    17        2  10.591
1     111        0        0     0     0     5    12    18        1   4.743
2     111        0        0     0     0     5    12    19        0   0.000
3     111        0        0     0     0     5    12    20        0   0.000
4     222        2        2     0     0    42    44    17        4  14.682
5     222        2        2     0     0    42    44    18        7  79.040
6     222        2        2     0     0    42    44    19        3  14.487
7     222        2        2     0     0    42    44    20        0  19.078

CodePudding user response:

Here's a solution that uses explode:

cols = ['Journal', 'Explode']
df = df.assign(**{c: df.filter(like=c).to_numpy().tolist() for c in cols}).drop([c for col in cols for c in df.filter(like=col).columns], axis=1).explode(cols)

Output:

>>> df
   Author  Var_one  Var_two  Var3  Var4  Var5  Var6 Journal   Score
0     111        0        0     0     0     5    12       2  10.591
0     111        0        0     0     0     5    12       1   4.743
0     111        0        0     0     0     5    12       0     0.0
0     111        0        0     0     0     5    12       0     0.0
1     222        2        2     0     0    42    44       4  14.682
1     222        2        2     0     0    42    44       7   79.04
1     222        2        2     0     0    42    44       3  14.487
1     222        2        2     0     0    42    44       0  19.078
  • Related