I have the following example dataframe:
data = {'ref':['1', '2', '3', '4', '5'],
'checked':[True, True, True, False, True],
'rag':['r', 'r', 'g', 'a', 'r'],
'group':['high', 'low', 'high', 'medium', 'high']}
dataframe = pd.DataFrame(data)
I want to group on group
and do some conditional counts where certain conditions are met so I get the following:
I can group by group
and do n
by the following:
df = dataframe.groupby(['group']).agg(
n=('ref', 'count')
).reset_index()
But I am struggling to also count the number of times for each group that:
- checked = True
- rag = g
- rag = a
- rag = r
Any help would be much appreciated!
edit: changed True/False strings to Boolean
CodePudding user response:
You have a few challenges.
For instance, your True/False are strings, so you should either initialize them as booleans and use sum
or convert to boolean during aggregation.
To count the rag, it's easier to use pandas.crosstab
and join it while you still have the groups as index.
df = (dataframe
.groupby(['group'])
.agg(**{'n': ('ref', 'count'),
'checked=True': ('checked', lambda s: s.eq('True').sum()),
})
.join(pd.crosstab(dataframe['group'], dataframe['rag'])
.add_prefix('rag=')
)
.reset_index()
)
output:
group n checked=True rag=a rag=g rag=r
0 high 3 3 0 1 2
1 low 1 1 0 0 1
2 medium 1 0 1 0 0
CodePudding user response:
You can try pivot_table
separately on your checked
and rag
columns
n = df.groupby(['group']).agg(n=('ref', 'count'))
dfs = []
for column in ['checked', 'rag']:
df_ = (df.pivot_table(index='group', columns=[column], values='ref',
aggfunc='count', fill_value=0)
.rename(columns=lambda col: f'{column}={col}')
.rename_axis(None, axis=1))
dfs.append(df_)
df = pd.concat(dfs, axis=1).drop('checked=False', axis=1)
print(n.join(df))
n checked=True rag=a rag=g rag=r
group
high 3 3 0 1 2
low 1 1 0 0 1
medium 1 0 1 0 0
CodePudding user response:
There is an issue with your data example:
data = {'ref':['1', '2', '3', '4', '5'],
'checked':[True, True, True, False, True],
'rag':['r', 'r', 'g', 'a', 'r'],
'group':['high', 'low', 'high', 'medium', 'high']}
df = pd.DataFrame(data)
for checked
column, you should enter value as True/False
without in quotation. Otherwise, python will interpret True'/'Fasle'
as string
.
The idea here is two steps: (1) you use iteration
with groupby
. (2) then you merge
/concat
them into 1 table:
# Create empty table
table = pd.DataFrame()
# List of column you want to iterate:
col_iter = ['checked', 'rag']
# Iterate:
for col in col_iter:
# Obtain unique values in each col used
uni = df[col].unique()
# Iterate for each unique value in col.
# Set tem var
# Concat tem to table
for val in uni:
tem = df.groupby('group').apply(lambda g: (g[col]==val).sum())
table = pd.concat([table, tem], axis=1).rename(columns={0:f'{col}={val}'})