Home > Software engineering >  Convert portion of Pandas dataframe into alternate layout
Convert portion of Pandas dataframe into alternate layout

Time:10-29

I have a dataframe in the following format:

Category Date Count Comment
Blue 10/24 7 None
Red 10/25 10 None
Green 10/23 5 None
Red 10/24 2 None
Blue 10/23 3 None
Red 10/26 11 None
Green 10/26 3 None

I want to take the data out of that dataframe and convert it to a dataframe like this:

Category 10/23 10/24 10/25 10/26
Blue 3 7
Red 2 10 11
Green 5 3

It's not just a matter of transposing my data. I think I have to convert some of my columns back into dicts/lists and then take those arrays and put them back into a dataframe. I'm thinking something like making a list of objects formatted like this

{Date : {category : count } }

I'm just not sure if that's the most efficient and I'm also not sure what the best way to turn that back into a dataframe would be.

Looking for some advice. I'm trying to display these tables in a Flask app so it's also possible I dont convert it back into a dataframe but display it using HTML table constructions, but I haven't been able to figure out the correct Jinja syntax.

My last option would be to take my data collection functions and output them into both table layouts so I dont have to do any transformations at all, but storing that data in two formats seems like it would be inefficient.

Any advice would be appreciated.

CodePudding user response:

You want a pivot table:

df = pd.DataFrame(
    {
        'Category': ['blue', 'red', 'green', 'red', 'blue', 'red', 'green'],
        'Date': ['10/24', '10/25', '10/23', '10/24', '10/23', '10/26', '10/26'],
        'Count': [7,10,5,2,3,11,3]
    }
)

df.pivot(index='Category', values='Count', columns='Date')

Date    10/23   10/24   10/25   10/26 
Category                 
blue    3.0     7.0     NaN     NaN
green   5.0     NaN     NaN     3.0 
red     NaN     2.0     10.0    11.0

CodePudding user response:

As an alternaive you can use groupby and unstack:

df_2 = df.groupby([
    'Category',
    'Date'
]).Count.max().unstack().fillna("")
print(df2_)
'''
Category    10-23   10-24   10-25   10-26
Blue        3.0     7.0     
Green       5.0                     3.0
Red                 2.0     10.0    11.0

'''
  • Related