I have a pandas dataframe that looks like this:
Type | Status |
---|---|
typeA | New |
typeA | Working |
typeA | Working |
typeA | Closed |
typeA | Closed |
typeA | Closed |
typeB | New |
typeB | Working |
typeC | Closed |
typeC | Closed |
typeC | Closed |
I'd like to group the dataframe by the 'Type' field and get the count of each status as a column, like so:
Type | New | Working | Closed |
---|---|---|---|
typeA | 1 | 2 | 3 |
typeB | 1 | 1 | 0 |
typeC | 0 | 0 | 3 |
I'd also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:
Type | New | Working | Closed | Escalate |
---|---|---|---|---|
typeA | 1 | 2 | 3 | 0 |
typeB | 1 | 1 | 0 | 0 |
typeC | 0 | 0 | 3 | 0 |
I'm able to get the counts per status by using:
closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()
I've also tried
closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()
But nothing seems to come out right.
I'm pretty lost. What's the best way to do this?
CodePudding user response:
Try as follows:
- Use
pd.crosstab
to reach the first stage of your desired output. - For the second stage, I am assuming that the
list
you mention indeed contains all possible values. If so, we can applydf.reindex
toaxis=1
to add the missing possibilities ascolumns
. - Since the extra columns will be added with
NaN
values, we can usedf.fillna
to get zeros.
possible_statuses = ['New','Working','Closed','Escalate']
res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
.reindex(possible_statuses, axis=1)
.fillna(0))
print(res)
Status New Working Closed Escalate
Type
typeA 1 2 3 0.0
typeB 1 1 0 0.0
typeC 0 0 3 0.0
"Cosmetic" additions:
res.columns.name = None # to get rid of "Status" as `columns.name`
res = res.astype(int) # to turn `0.0` (floats) for added cols into `0` (integers)
An alternative approach to reach the first stage could be as follows:
- Use
df.groupby
withvalue_counts
and chaindf.unstack
:
res = (closureCodes.groupby('Type')
.value_counts()
.unstack()
.reindex(possible_statuses, axis=1)
.fillna(0))
print(res)
Status New Working Closed Escalate
Type
typeA 1.0 2.0 3.0 0.0
typeB 1.0 1.0 0.0 0.0
typeC 0.0 0.0 3.0 0.0
This is, of course, pretty close to what you were trying to do in the first place (but you don't need the intermediate closureCodeCounts
).
CodePudding user response:
You can make use of the pivot table to transpose your grouped Dataframe -
closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)
And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.
possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)