sample data:
type1 = ['bunny','cow','pig','chicken','elephant']
type2 = ['flower','tree','grass']
data_type1 = np.random.choice(a = type1, p = [0.05,0.15,0.25,0.05,0.5], size=1000)
data_type2 = np.random.choice(a = type2, p = [0.25,0.25,0.50], size=1000)
data_value = stats.poisson.rvs(loc = 18, mu = 30, size = 1000)
df = pd.DataFrame({'type1':data_type1,'type2':data_type2,'data':data_value})
grp1 = pd.unique(df.type1.values)
grp1.sort()
grp2 = pd.unique(df.type2.values)
grp2.sort()
m_df = pd.DataFrame(index = grp1,columns=grp2)
given a df structured as 'df', how would I populate 'm_df' with the count of each type combo. I want to accomplish df.groupby(['type1','type2'])['data'].count()
but I'm not sure how to write it into a df so it looks nice and is more useful.
Edit: below is a simpler df that also works as an example. The expected output is the number of values for each type combination -- the exact output shown when the input is df.groupby(['type1','type2'])['data'].count()
. The question is how to represent that output as a dataframe resembling m_df.
df = pd.DataFrame({'type1': ['bunny','cow','pig','chicken','elephant','cow','pig'],
'data': [32,23,45,35,20,28,55],
'type2':['female', 'male','male','male','male','female','female']})
CodePudding user response:
IIUC, use value_counts
(same as groupby_count
) then unstack
the second index level:
>>> (df.value_counts(['type1', 'type2'])
.unstack('type2', fill_value=0)
.rename_axis(index=None, columns=None))
female male
bunny 1 0
chicken 0 1
cow 1 1
elephant 0 1
pig 1 1
You can also use pd.crosstab
or pivot_table
:
>>> (pd.crosstab(df['type1'], df['type2'], 1, aggfunc='count')
.fillna(0).astype(int).rename_axis(index=None, columns=None))
>>> (df.assign(val=1).pivot_table('val', 'type1', 'type2', aggfunc=sum, fill_value=0)
.rename_axis(index=None, columns=None))
CodePudding user response:
So I have used the input what you provided.
I don't know, what you are trying to achieve here but from what I feel you required is.
The code???
df_grp = df.groupby(['type1','type2'], as_index=False)['data'].count()
The output looks like this