Python Pandas Pivot() with Dates as Column Names Need to Sort and Format-CodePudding

I've seen the other comments on here about how you can correctly sort the columns when the pivot() makes the date the column name. What I haven't seen is how to convert those dates to a format I want after the sorting.

Here is my example. I have data that looks like this:

category	date	value1	value2
a	12/1/2018	1	25
a	1/1/2019	2	26
a	2/1/2019	3	27
b	12/1/2018	4	28
b	1/1/2019	5	29
b	2/1/2019	6	30
c	12/1/2018	7	31
c	1/1/2019	8	32
c	2/1/2019	9	33

What I want is something like this:

category	value	Dec-18	Jan-19	Feb-19
a	value1	1	2	3
a	value2	25	26	27
b	value1	4	5	6
b	value2	28	29	30
c	value1	7	8	9
c	value2	31	32	33

I have figured out how to get pivot to do one value at a time, and to sort the dates as long as I don't try to format them. But pivot is converting those dates to text so I can't change the format anymore.

This is what I have so far:

#df['date'] = pd.to_datetime(df.date).dt.strftime('%b-%y') final = df.pivot(index='category', columns='date', values='value1')

If I do the date formatting first, it sorts incorrectly. I also tried pivot_table but it won't let me set aggfunc=None.

The date formatting has to be dynamic aka every month the date range is going to be different (last 12 months), so I don't want to hard-code the date.

Thanks!

CodePudding user response：

This might not be the cleanest solution.

Here's the code I used to reproduce what you have already:

import pandas as pd

df = pd.DataFrame({
    "category":["a","a","a","b","b","b","c","c", "c"],
    "date":["12/1/2018", "1/1/2019", "2/1/2019", "12/1/2018", "1/1/2019", 
    "2/1/2019", "12/1/2018", "1/1/2019", "2/1/2019"], 
    "value1": [1,2,3,4,5,6,7,8,9], 
    "value2":[25,26,27,28,29,30,31,32,33],
    })

df["date"] = pd.to_datetime(df["date"]).dt.strftime("%b-%y")

Then, one can get close to your desired pandas.DataFrame with:

df = df.pivot(index="date", columns="category") # rows are `date` and columns are `value-category` pairs
df = df.unstack()                               # Bring back `value1` and `value2` as an index
df = df.reset_index("date")                     # `date` was an index, we bring it back as a column
df = df.pivot(columns="date")                   # Use pandas.DataFrame.pivot() once more to pivot on `date`
df = df.sort_values(by="category")              # Sort output as shown in your desired `pandas.DataFrame`

It results in this pandas.DataFrame:

                     0              
date            Dec-18 Feb-19 Jan-19
       category                     
value1 a             1      3      2
value2 a            25     27     26
value1 b             4      6      5
value2 b            28     30     29
value1 c             7      9      8
value2 c            31     33     32

CodePudding user response：

I have figured this out with a little help from a colleague.

df2= df.pivot(index="date", columns="category")
df2= df2.unstack() 
df2= df2.reset_index("date")  
df2= df2.pivot(columns="date") 
df2= df2.sort_values(by="category")
df2.columns = df2.columns.set_levels(summary2.columns.levels[1].strftime('%Y-%b-%d'), level='date')

This date format isn't exactly what I was looking for but it works and looks better than the original format.

df2= df2.droplevel(level=0, axis=1)
df2.columns = [''.join(col).strip() for col in df2.columns.values]
df2.columns = summary2.columns.get_level_values(0)

The table ends up looking like this:

	category	2018-Dec-1	2019-Jan-1	2019-Feb-1
value1	a	1	2	3
value2	a	25	26	27
value1	b	4	5	6
value2	b	28	29	30
value1	c	7	8	9
value2	c	31	32	33