Pandas - Get column value counts as new columns in dataframe-CodePudding

I have a pandas dataframe that looks like this:

Type	Status
typeA	New
typeA	Working
typeA	Working
typeA	Closed
typeA	Closed
typeA	Closed
typeB	New
typeB	Working
typeC	Closed
typeC	Closed
typeC	Closed

I'd like to group the dataframe by the 'Type' field and get the count of each status as a column, like so:

Type	New	Working	Closed
typeA	1	2	3
typeB	1	1	0
typeC	0	0	3

I'd also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:

Type	New	Working	Closed
typeA	1	2	3
typeB	1	1	0
typeC	0	0	3

I'm able to get the counts per status by using:

closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()

I've also tried

closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()

But nothing seems to come out right.

I'm pretty lost. What's the best way to do this?

CodePudding user response：

Try as follows:

Use pd.crosstab to reach the first stage of your desired output.
For the second stage, I am assuming that the list you mention indeed contains all possible values. If so, we can apply df.reindex to axis=1 to add the missing possibilities as columns.
Since the extra columns will be added with NaN values, we can use df.fillna to get zeros.

possible_statuses = ['New','Working','Closed','Escalate']

res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
       .reindex(possible_statuses, axis=1)
       .fillna(0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA     1        2       3       0.0
typeB     1        1       0       0.0
typeC     0        0       3       0.0

"Cosmetic" additions:

res.columns.name = None # to get rid of "Status" as `columns.name`
res = res.astype(int) # to turn `0.0` (floats) for added cols into `0` (integers)

An alternative approach to reach the first stage could be as follows:

Use df.groupby with value_counts and chain df.unstack:

res = (closureCodes.groupby('Type')
       .value_counts()
       .unstack()
       .reindex(possible_statuses, axis=1)
       .fillna(0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA   1.0      2.0     3.0       0.0
typeB   1.0      1.0     0.0       0.0
typeC   0.0      0.0     3.0       0.0

This is, of course, pretty close to what you were trying to do in the first place (but you don't need the intermediate closureCodeCounts).

CodePudding user response：

You can make use of the pivot table to transpose your grouped Dataframe -

closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)

And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.

possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)