Home > Mobile >  How can I pivot AND stack a Pandas DataFrame?
How can I pivot AND stack a Pandas DataFrame?

Time:11-25

As a rather infrequent user of Pandas, I'd like to know how best to pivot one column (representing time) so that it flows horizontally, while stacking the rest based on another column or index.

Here is what I mean:

data = [
    [2018, "Alex", 172, 61], [2019, "Alex", 173, 62], [2020, "Alex", 173, 63],
    [2018, "Bill", 168, 59], [2019, "Bill", 168, 59], [2020, "Bill", 169, 60],
    [2018, "Cody", 193, 67], [2019, "Cody", 194, 69], [2020, "Cody", 194, 68],
]

df = pd.DataFrame(data, columns=["year", "name", "height", "weight"])

Which gives:

year  name  height  weight

2018  Alex  172     61
2019  Alex  173     62
2020  Alex  173     63
2018  Bill  168     59
2019  Bill  168     59
2020  Bill  169     60
2018  Cody  193     67
2019  Cody  194     69
2020  Cody  194     68

I would like to pivot this DataFrame horizontally so that year flows horizontally (ascending), while essentially stacking all other columns grouped by name so that my dataframe looks like this:

Alex           2018  2019  2020
      height    172   173   173
      weight     61    62    63
Bill
      height    168   168   169
      weight     59    59    60
Cody
      height    193    194   69
      weight     67    194   68

In summary there are three things I am trying to accomplish here:

  1. Pivot horizontally by year
  2. Ensure that year flows in ascending order, i.e. lowest to highest
  3. Group and stack the remaining columns by the name column

There are a lot of resources online about pivoting and stacking separately but not usually together like I am trying to do.

CodePudding user response:

Let's do set_index then stack to convert height and weight to row labels, and unstack year to make the year level into columns:

new_df = df.set_index(['year', 'name']).stack().unstack('year')

new_df:

year         2018  2019  2020
name                         
Alex height   172   173   173
     weight    61    62    63
Bill height   168   168   169
     weight    59    59    60
Cody height   193   194   194
     weight    67    69    68

*Note: stack and unstack are going to sort index levels when reshaping.


There are several alternatives depending on whether going wide first of long first.

An additional (but slower) long first then wide approach would be melt pivot:

df.melt(
    id_vars=['year', 'name']
).pivot(
    index=['name', 'variable'],
    columns='year',
    values='value'
)

year           2018  2019  2020
name variable                  
Alex height     172   173   173
     weight      61    62    63
Bill height     168   168   169
     weight      59    59    60
Cody height     193   194   194
     weight      67    69    68

Then the wide first approaches:

set_index then unstack first to go wide and stack after to go long:

df.set_index(['year', 'name']).unstack('year').stack(level=0)

year         2018  2019  2020
name                         
Alex height   172   173   173
     weight    61    62    63
Bill height   168   168   169
     weight    59    59    60
Cody height   193   194   194
     weight    67    69    68

Or equivalent with pivot instead of set_index unstack as recommended by @sammywemmy:

df.pivot(index='name', columns='year').stack(level=0)

year         2018  2019  2020
name                         
Alex height   172   173   173
     weight    61    62    63
Bill height   168   168   169
     weight    59    59    60
Cody height   193   194   194
     weight    67    69    68

CodePudding user response:

try this:

df.set_index(['year', 'name']).stack().unstack(0)
>>>
        year    2018    2019    2020
name                
Alex    height  172     173     173
        weight  61      62      63
Bill    height  168     168     169
        weight  59      59      60
Cody    height  193     194     194
        weight  67      69      68
  • Related