There many answers, but still cannot resolve. I have a dataframe:
{'Author': {0: 111, 1: 222}, 'Journal17': {0: 2, 1: 4}, 'Journal18': {0: 1, 1: 7}, 'Journal19': {0: 0, 1: 3}, 'Journal20': {0: 0, 1: 0}, 'Var_one': {0: 0, 1: 2}, 'Var_two': {0: 0, 1: 2}, 'Score17': {0: 10.591, 1: 14.682}, 'Score18': {0: 4.743, 1: 79.04}, 'Score19': {0: 0.0, 1: 14.487}, 'Score20': {0: 0.0, 1: 19.078}, 'Var3': {0: 0, 1: 0}, 'Var4': {0: 0, 1: 0}, 'Var5': {0: 5, 1: 42}, 'Var6': {0: 12, 1: 44}}
I am trying to convert it from wide to long by two columns, but nothing work so far.
I want to gather Journal and Score by Year.
Expected output:
Author Year Journal Score Var_one Var_two Var3 Var4 Var5 Var6
111 2017 2 10.591 0 0 0 0 5 12
111 2018 1 4.743 0 0 0 0 5 12
111 2019 0 0 0 0 0 0 5 12
111 2020 0 0 0 0 0 0 5 12
222 2017 4 14.682 2 2 0 0 42 44
222 2018 7 79.04 2 2 0 0 42 44
222 2019 3 14.487 2 2 0 0 42 44
222 2020 0 19.078 2 2 0 0 42 44
Thanks!
CodePudding user response:
Let's use pd.wide_to_long
:
pd.wide_to_long(df,
['Journal','Score'],
['Author','Var_one', 'Var_two', 'Var3', 'Var4', 'Var5', 'Var6'],
'Year',
sep='',
suffix='\d ').reset_index()
Output:
Author Var_one Var_two Var3 Var4 Var5 Var6 Year Journal Score
0 111 0 0 0 0 5 12 17 2 10.591
1 111 0 0 0 0 5 12 18 1 4.743
2 111 0 0 0 0 5 12 19 0 0.000
3 111 0 0 0 0 5 12 20 0 0.000
4 222 2 2 0 0 42 44 17 4 14.682
5 222 2 2 0 0 42 44 18 7 79.040
6 222 2 2 0 0 42 44 19 3 14.487
7 222 2 2 0 0 42 44 20 0 19.078
CodePudding user response:
Here's a solution that uses explode
:
cols = ['Journal', 'Explode']
df = df.assign(**{c: df.filter(like=c).to_numpy().tolist() for c in cols}).drop([c for col in cols for c in df.filter(like=col).columns], axis=1).explode(cols)
Output:
>>> df
Author Var_one Var_two Var3 Var4 Var5 Var6 Journal Score
0 111 0 0 0 0 5 12 2 10.591
0 111 0 0 0 0 5 12 1 4.743
0 111 0 0 0 0 5 12 0 0.0
0 111 0 0 0 0 5 12 0 0.0
1 222 2 2 0 0 42 44 4 14.682
1 222 2 2 0 0 42 44 7 79.04
1 222 2 2 0 0 42 44 3 14.487
1 222 2 2 0 0 42 44 0 19.078