I've seen the other comments on here about how you can correctly sort the columns when the pivot() makes the date the column name. What I haven't seen is how to convert those dates to a format I want after the sorting.
Here is my example. I have data that looks like this:
category | date | value1 | value2 |
---|---|---|---|
a | 12/1/2018 | 1 | 25 |
a | 1/1/2019 | 2 | 26 |
a | 2/1/2019 | 3 | 27 |
b | 12/1/2018 | 4 | 28 |
b | 1/1/2019 | 5 | 29 |
b | 2/1/2019 | 6 | 30 |
c | 12/1/2018 | 7 | 31 |
c | 1/1/2019 | 8 | 32 |
c | 2/1/2019 | 9 | 33 |
What I want is something like this:
category | value | Dec-18 | Jan-19 | Feb-19 |
---|---|---|---|---|
a | value1 | 1 | 2 | 3 |
a | value2 | 25 | 26 | 27 |
b | value1 | 4 | 5 | 6 |
b | value2 | 28 | 29 | 30 |
c | value1 | 7 | 8 | 9 |
c | value2 | 31 | 32 | 33 |
I have figured out how to get pivot to do one value at a time, and to sort the dates as long as I don't try to format them. But pivot is converting those dates to text so I can't change the format anymore.
This is what I have so far:
#df['date'] = pd.to_datetime(df.date).dt.strftime('%b-%y') final = df.pivot(index='category', columns='date', values='value1')
If I do the date formatting first, it sorts incorrectly. I also tried pivot_table but it won't let me set aggfunc=None.
The date formatting has to be dynamic aka every month the date range is going to be different (last 12 months), so I don't want to hard-code the date.
Thanks!
CodePudding user response:
This might not be the cleanest solution.
Here's the code I used to reproduce what you have already:
import pandas as pd
df = pd.DataFrame({
"category":["a","a","a","b","b","b","c","c", "c"],
"date":["12/1/2018", "1/1/2019", "2/1/2019", "12/1/2018", "1/1/2019",
"2/1/2019", "12/1/2018", "1/1/2019", "2/1/2019"],
"value1": [1,2,3,4,5,6,7,8,9],
"value2":[25,26,27,28,29,30,31,32,33],
})
df["date"] = pd.to_datetime(df["date"]).dt.strftime("%b-%y")
Then, one can get close to your desired pandas.DataFrame
with:
df = df.pivot(index="date", columns="category") # rows are `date` and columns are `value-category` pairs
df = df.unstack() # Bring back `value1` and `value2` as an index
df = df.reset_index("date") # `date` was an index, we bring it back as a column
df = df.pivot(columns="date") # Use pandas.DataFrame.pivot() once more to pivot on `date`
df = df.sort_values(by="category") # Sort output as shown in your desired `pandas.DataFrame`
It results in this pandas.DataFrame
:
0
date Dec-18 Feb-19 Jan-19
category
value1 a 1 3 2
value2 a 25 27 26
value1 b 4 6 5
value2 b 28 30 29
value1 c 7 9 8
value2 c 31 33 32
CodePudding user response:
I have figured this out with a little help from a colleague.
df2= df.pivot(index="date", columns="category")
df2= df2.unstack()
df2= df2.reset_index("date")
df2= df2.pivot(columns="date")
df2= df2.sort_values(by="category")
df2.columns = df2.columns.set_levels(summary2.columns.levels[1].strftime('%Y-%b-%d'), level='date')
This date format isn't exactly what I was looking for but it works and looks better than the original format.
df2= df2.droplevel(level=0, axis=1)
df2.columns = [''.join(col).strip() for col in df2.columns.values]
df2.columns = summary2.columns.get_level_values(0)
The table ends up looking like this:
category | 2018-Dec-1 | 2019-Jan-1 | 2019-Feb-1 | |
---|---|---|---|---|
value1 | a | 1 | 2 | 3 |
value2 | a | 25 | 26 | 27 |
value1 | b | 4 | 5 | 6 |
value2 | b | 28 | 29 | 30 |
value1 | c | 7 | 8 | 9 |
value2 | c | 31 | 32 | 33 |