I'm trying to arrange a pandas dataframe that looks like this:
{'Sample': {0: 'Mark', 1: 'Mark', 2: 'Mark', 3: 'Mark', 4: 'Mark', 5: 'John', 6: 'John', 7: 'John', 8: 'John', 9: 'John', 10: 'Lisa', 11: 'Lisa', 12: 'Lisa', 13: 'Lisa', 14: 'Lisa'}, 'CO': {0: 3, 1: 3, 2: 2, 3: 4, 4: 5, 5: 10, 6: 11, 7: 11, 8: 14, 9: 11, 10: 23, 11: 25, 12: 26, 13: 23, 14: 45}, 'Argon': {0: 7, 1: 8, 2: 9, 3: 5, 4: 7, 5: 45, 6: 65, 7: 45, 8: 67, 9: 43, 10: 34, 11: 78, 12: 79, 13: 87, 14: 67}, 'Carbon': {0: 35, 1: 43, 2: 78, 3: 54, 4: 34, 5: 56, 6: 58, 7: 50, 8: 53, 9: 55, 10: 87, 11: 77, 12: 88, 13: 89, 14: 85}}
Sample | CO | Argon | Carbon |
---|---|---|---|
Mark | 3 | 7 | 35 |
Mark | 3 | 8 | 43 |
Mark | 2 | 9 | 78 |
Mark | 4 | 5 | 54 |
Mark | 5 | 7 | 34 |
John | 10 | 45 | 56 |
John | 11 | 65 | 58 |
John | 11 | 45 | 50 |
John | 14 | 67 | 53 |
John | 11 | 43 | 55 |
Lisa | 23 | 34 | 87 |
Lisa | 25 | 78 | 77 |
Lisa | 26 | 79 | 88 |
Lisa | 23 | 87 | 89 |
Lisa | 45 | 67 | 85 |
To this:
{'Chemical': {0: 'Argon', 1: 'Argon', 2: 'Argon', 3: 'Argon', 4: 'Argon', 5: 'Carbon', 6: 'Carbon', 7: 'Carbon', 8: 'Carbon', 9: 'Carbon', 10: 'CO', 11: 'CO', 12: 'CO', 13: 'CO', 14: 'CO'}, 'Mark': {0: 7, 1: 8, 2: 9, 3: 5, 4: 7, 5: 35, 6: 43, 7: 78, 8: 54, 9: 34, 10: 3, 11: 3, 12: 2, 13: 4, 14: 5}, 'John': {0: 45, 1: 65, 2: 45, 3: 67, 4: 43, 5: 56, 6: 58, 7: 50, 8: 53, 9: 55, 10: 10, 11: 11, 12: 11, 13: 14, 14: 11}, 'Lisa': {0: 34, 1: 78, 2: 79, 3: 87, 4: 67, 5: 87, 6: 77, 7: 88, 8: 89, 9: 85, 10: 23, 11: 25, 12: 26, 13: 23, 14: 45}}
Chemical | Mark | John | Lisa |
---|---|---|---|
Argon | 7 | 45 | 34 |
Argon | 8 | 65 | 78 |
Argon | 9 | 45 | 79 |
Argon | 5 | 67 | 87 |
Argon | 7 | 43 | 67 |
Carbon | 35 | 56 | 87 |
Carbon | 43 | 58 | 77 |
Carbon | 78 | 50 | 88 |
Carbon | 54 | 53 | 89 |
Carbon | 34 | 55 | 85 |
CO | 3 | 10 | 23 |
CO | 3 | 11 | 25 |
CO | 2 | 11 | 26 |
CO | 4 | 14 | 23 |
CO | 5 | 11 | 45 |
The data is body measurements of three people, and each chemical was measured 5 times per person
It looks like a simple case of transposing, but not surprisingly, it does exactly what it says on the tin! I've also tried melting the dataframe and groupby, but I'm really not getting anywhere with it. Any help would be most welcome.
Thank you!
CodePudding user response:
Use melt
with pivot_table
in one-liner version:
>>> df.melt('Sample', var_name='Chemical') \
.assign(Idx=lambda x: x.groupby('Sample').cumcount()) \
.pivot_table('value', ['Idx', 'Chemical'], 'Sample') \
.droplevel(0).rename_axis(columns=None).reset_index()
Chemical John Lisa Mark
0 CO 10 23 3
1 CO 11 25 3
2 CO 11 26 2
3 CO 14 23 4
4 CO 11 45 5
5 Argon 45 34 7
6 Argon 65 78 8
7 Argon 45 79 9
8 Argon 67 87 5
9 Argon 43 67 7
10 Carbon 56 87 35
11 Carbon 58 77 43
12 Carbon 50 88 78
13 Carbon 53 89 54
14 Carbon 55 85 34
You can execute all steps independently to see the transformation:
>>> out = df.melt('Sample', var_name='Chemical')
>>> out = out.assign(Idx=lambda x: x.groupby('Sample').cumcount())
>>> out = out.pivot_table('value', ['Idx', 'Chemical'], 'Sample')
>>> out = out.droplevel(0).rename_axis(columns=None).reset_index()
CodePudding user response:
Try:
dfm = df.melt('Sample', var_name='Chemical')
dfm.set_index([dfm.groupby('Sample').cumcount(), 'Chemical', 'Sample']).unstack()['value']
Output:
Sample John Lisa Mark
Chemical
0 CO 10 23 3
1 CO 11 25 3
2 CO 11 26 2
3 CO 14 23 4
4 CO 11 45 5
5 Argon 45 34 7
6 Argon 65 78 8
7 Argon 45 79 9
8 Argon 67 87 5
9 Argon 43 67 7
10 Carbon 56 87 35
11 Carbon 58 77 43
12 Carbon 50 88 78
13 Carbon 53 89 54
14 Carbon 55 85 34
CodePudding user response:
a = a.set_index("Sample")
pd.concat([a.loc[nm].unstack().rename(nm).droplevel("Sample")
for nm in a.index.unique()], axis=1)
# Mark John Lisa
# CO 3 10 23
# CO 3 11 25
# CO 2 11 26
# CO 4 14 23
# CO 5 11 45
# Argon 7 45 34
# Argon 8 65 78
# Argon 9 45 79
# Argon 5 67 87
# Argon 7 43 67
# Carbon 35 56 87
# Carbon 43 58 77
# Carbon 78 50 88
# Carbon 54 53 89
# Carbon 34 55 85