Home > Back-end >  Pandas Dataframe display total
Pandas Dataframe display total

Time:04-16

Here is an example dataset found from google search close to my datasets in my environment

enter image description here

I'm trying to get output like this

enter image description here

import pandas as pd
import numpy as np

data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'],
        'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'],
        'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}

df=pd.DataFrame(data, columns=['Product','State','Sales'])
df1=df.sort_values('State')
#df1['Total']=df1.groupby('State').count()   
df1['line']=df1.groupby('State').cumcount() 1
print(df1,index=False)

 
 Commented out line throws this error 
 ValueError: Columns must be same length as key
 
Tried with size() it gives NaN for all rows

Hope someone points me to right direction Thanks in advance

CodePudding user response:

I think this should work for 'Total':

df1['Total']=df1.groupby('State')['Product'].transform(lambda x: x.count())  

CodePudding user response:

Try this:

df = pd.DataFrame(data).sort_values("State")

grp = df.groupby("State")
df["Total"] = grp["State"].transform("size")
df["line"] = grp.cumcount()   1
  • Related