As a rather infrequent user of Pandas, I'd like to know how best to pivot one column (representing time) so that it flows horizontally, while stacking the rest based on another column or index.
Here is what I mean:
data = [
[2018, "Alex", 172, 61], [2019, "Alex", 173, 62], [2020, "Alex", 173, 63],
[2018, "Bill", 168, 59], [2019, "Bill", 168, 59], [2020, "Bill", 169, 60],
[2018, "Cody", 193, 67], [2019, "Cody", 194, 69], [2020, "Cody", 194, 68],
]
df = pd.DataFrame(data, columns=["year", "name", "height", "weight"])
Which gives:
year name height weight
2018 Alex 172 61
2019 Alex 173 62
2020 Alex 173 63
2018 Bill 168 59
2019 Bill 168 59
2020 Bill 169 60
2018 Cody 193 67
2019 Cody 194 69
2020 Cody 194 68
I would like to pivot this DataFrame horizontally so that year flows horizontally (ascending), while essentially stacking all other columns grouped by name so that my dataframe looks like this:
Alex 2018 2019 2020
height 172 173 173
weight 61 62 63
Bill
height 168 168 169
weight 59 59 60
Cody
height 193 194 69
weight 67 194 68
In summary there are three things I am trying to accomplish here:
- Pivot horizontally by year
- Ensure that year flows in ascending order, i.e. lowest to highest
- Group and stack the remaining columns by the name column
There are a lot of resources online about pivoting and stacking separately but not usually together like I am trying to do.
CodePudding user response:
Let's do set_index
then stack
to convert height
and weight
to row labels, and unstack
year
to make the year
level into columns:
new_df = df.set_index(['year', 'name']).stack().unstack('year')
new_df
:
year 2018 2019 2020
name
Alex height 172 173 173
weight 61 62 63
Bill height 168 168 169
weight 59 59 60
Cody height 193 194 194
weight 67 69 68
*Note: stack
and unstack
are going to sort index levels when reshaping.
There are several alternatives depending on whether going wide first of long first.
An additional (but slower) long first then wide approach would be melt
pivot
:
df.melt(
id_vars=['year', 'name']
).pivot(
index=['name', 'variable'],
columns='year',
values='value'
)
year 2018 2019 2020
name variable
Alex height 172 173 173
weight 61 62 63
Bill height 168 168 169
weight 59 59 60
Cody height 193 194 194
weight 67 69 68
Then the wide first approaches:
set_index
then unstack
first to go wide and stack
after to go long:
df.set_index(['year', 'name']).unstack('year').stack(level=0)
year 2018 2019 2020
name
Alex height 172 173 173
weight 61 62 63
Bill height 168 168 169
weight 59 59 60
Cody height 193 194 194
weight 67 69 68
Or equivalent with pivot
instead of set_index
unstack
as recommended by @sammywemmy:
df.pivot(index='name', columns='year').stack(level=0)
year 2018 2019 2020
name
Alex height 172 173 173
weight 61 62 63
Bill height 168 168 169
weight 59 59 60
Cody height 193 194 194
weight 67 69 68
CodePudding user response:
try this:
df.set_index(['year', 'name']).stack().unstack(0)
>>>
year 2018 2019 2020
name
Alex height 172 173 173
weight 61 62 63
Bill height 168 168 169
weight 59 59 60
Cody height 193 194 194
weight 67 69 68