Home > Software engineering >  Python Pandas Pivot() with Dates as Column Names Need to Sort and Format
Python Pandas Pivot() with Dates as Column Names Need to Sort and Format

Time:12-23

I've seen the other comments on here about how you can correctly sort the columns when the pivot() makes the date the column name. What I haven't seen is how to convert those dates to a format I want after the sorting.

Here is my example. I have data that looks like this:

category date value1 value2
a 12/1/2018 1 25
a 1/1/2019 2 26
a 2/1/2019 3 27
b 12/1/2018 4 28
b 1/1/2019 5 29
b 2/1/2019 6 30
c 12/1/2018 7 31
c 1/1/2019 8 32
c 2/1/2019 9 33

What I want is something like this:

category value Dec-18 Jan-19 Feb-19
a value1 1 2 3
a value2 25 26 27
b value1 4 5 6
b value2 28 29 30
c value1 7 8 9
c value2 31 32 33

I have figured out how to get pivot to do one value at a time, and to sort the dates as long as I don't try to format them. But pivot is converting those dates to text so I can't change the format anymore.

This is what I have so far:

#df['date'] = pd.to_datetime(df.date).dt.strftime('%b-%y') final = df.pivot(index='category', columns='date', values='value1')

If I do the date formatting first, it sorts incorrectly. I also tried pivot_table but it won't let me set aggfunc=None.

The date formatting has to be dynamic aka every month the date range is going to be different (last 12 months), so I don't want to hard-code the date.

Thanks!

CodePudding user response:

This might not be the cleanest solution.

Here's the code I used to reproduce what you have already:

import pandas as pd

df = pd.DataFrame({
    "category":["a","a","a","b","b","b","c","c", "c"],
    "date":["12/1/2018", "1/1/2019", "2/1/2019", "12/1/2018", "1/1/2019", 
    "2/1/2019", "12/1/2018", "1/1/2019", "2/1/2019"], 
    "value1": [1,2,3,4,5,6,7,8,9], 
    "value2":[25,26,27,28,29,30,31,32,33],
    })

df["date"] = pd.to_datetime(df["date"]).dt.strftime("%b-%y")

Then, one can get close to your desired pandas.DataFrame with:

df = df.pivot(index="date", columns="category") # rows are `date` and columns are `value-category` pairs
df = df.unstack()                               # Bring back `value1` and `value2` as an index
df = df.reset_index("date")                     # `date` was an index, we bring it back as a column
df = df.pivot(columns="date")                   # Use pandas.DataFrame.pivot() once more to pivot on `date`
df = df.sort_values(by="category")              # Sort output as shown in your desired `pandas.DataFrame`

It results in this pandas.DataFrame:

                     0              
date            Dec-18 Feb-19 Jan-19
       category                     
value1 a             1      3      2
value2 a            25     27     26
value1 b             4      6      5
value2 b            28     30     29
value1 c             7      9      8
value2 c            31     33     32

CodePudding user response:

I have figured this out with a little help from a colleague.

df2= df.pivot(index="date", columns="category")
df2= df2.unstack() 
df2= df2.reset_index("date")  
df2= df2.pivot(columns="date") 
df2= df2.sort_values(by="category")
df2.columns = df2.columns.set_levels(summary2.columns.levels[1].strftime('%Y-%b-%d'), level='date')

This date format isn't exactly what I was looking for but it works and looks better than the original format.

df2= df2.droplevel(level=0, axis=1)
df2.columns = [''.join(col).strip() for col in df2.columns.values]
df2.columns = summary2.columns.get_level_values(0)

The table ends up looking like this:

category 2018-Dec-1 2019-Jan-1 2019-Feb-1
value1 a 1 2 3
value2 a 25 26 27
value1 b 4 5 6
value2 b 28 29 30
value1 c 7 8 9
value2 c 31 32 33
  • Related