Home > Back-end >  How to do a groupby with a regex over a dataframe column in python?
How to do a groupby with a regex over a dataframe column in python?

Time:10-27

I have a table dataframe called clientesv with column called COBERTURA whose values are:

clientesv.groupby('COBERTURA').size()

COBERTURA
          
A        9174
A3          1
B        1148
B0        179
B1       3922
B2          3
B3       1971
C        1511
C1       1065
C3        359
C4        145
C5         22
C6         87
C7        493
C8        174
D1         12
D2      16016
E          62
E1          5
M B      2751
M P      3080
M10     10281
M15      5187
M5       3765
dtype: int64

I would like to summarise using a Regular Expression in the Groupby so the result is:

A  (total amount of A   A0)
B  (total amount of B0, B1, B2, B3)
C  etc
D
E 
M

I tried to do something like this:

clientesv.groupby(clientesv.COBERTURA.str.contains(r'\A', regex=True)).size()

But this expression only summarise values for A so I don't know how to follow...

CodePudding user response:

Try ^(\D) to get all the non-digit characters at the beginning of the strings. Also value_counts is faster than groupby().size().

clientcsv.COBERTURA.str.extract('^(\D) ', expand=False).value_counts()

CodePudding user response:

You can create an aux column, with only the first letter.

clientesv['new_label'] = clientesv.COBERTURA.str[0]
group_clients = clientesv.groupby('new_label')['COBERTURA'].agg('sum').reset_index()
  • Related