Home > Enterprise >  Pandas - Get column value counts as new columns in dataframe
Pandas - Get column value counts as new columns in dataframe

Time:11-01

I have a pandas dataframe that looks like this:

Type Status
typeA New
typeA Working
typeA Working
typeA Closed
typeA Closed
typeA Closed
typeB New
typeB Working
typeC Closed
typeC Closed
typeC Closed

I'd like to group the dataframe by the 'Type' field and get the count of each status as a column, like so:

Type New Working Closed
typeA 1 2 3
typeB 1 1 0
typeC 0 0 3

I'd also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:

Type New Working Closed Escalate
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0

I'm able to get the counts per status by using:

closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()

I've also tried

closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()

But nothing seems to come out right.

I'm pretty lost. What's the best way to do this?

CodePudding user response:

Try as follows:

  • Use pd.crosstab to reach the first stage of your desired output.
  • For the second stage, I am assuming that the list you mention indeed contains all possible values. If so, we can apply df.reindex to axis=1 to add the missing possibilities as columns.
  • Since the extra columns will be added with NaN values, we can use df.fillna to get zeros.
possible_statuses = ['New','Working','Closed','Escalate']

res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
       .reindex(possible_statuses, axis=1)
       .fillna(0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA     1        2       3       0.0
typeB     1        1       0       0.0
typeC     0        0       3       0.0

"Cosmetic" additions:

res.columns.name = None # to get rid of "Status" as `columns.name`
res = res.astype(int) # to turn `0.0` (floats) for added cols into `0` (integers)

An alternative approach to reach the first stage could be as follows:

res = (closureCodes.groupby('Type')
       .value_counts()
       .unstack()
       .reindex(possible_statuses, axis=1)
       .fillna(0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA   1.0      2.0     3.0       0.0
typeB   1.0      1.0     0.0       0.0
typeC   0.0      0.0     3.0       0.0

This is, of course, pretty close to what you were trying to do in the first place (but you don't need the intermediate closureCodeCounts).

CodePudding user response:

You can make use of the pivot table to transpose your grouped Dataframe -

closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)

And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.

possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)
  • Related