I have a table dataframe called clientesv with column called COBERTURA whose values are:
clientesv.groupby('COBERTURA').size()
COBERTURA
A 9174
A3 1
B 1148
B0 179
B1 3922
B2 3
B3 1971
C 1511
C1 1065
C3 359
C4 145
C5 22
C6 87
C7 493
C8 174
D1 12
D2 16016
E 62
E1 5
M B 2751
M P 3080
M10 10281
M15 5187
M5 3765
dtype: int64
I would like to summarise using a Regular Expression in the Groupby so the result is:
A (total amount of A A0)
B (total amount of B0, B1, B2, B3)
C etc
D
E
M
I tried to do something like this:
clientesv.groupby(clientesv.COBERTURA.str.contains(r'\A', regex=True)).size()
But this expression only summarise values for A so I don't know how to follow...
CodePudding user response:
Try ^(\D)
to get all the non-digit characters at the beginning of the strings. Also value_counts
is faster than groupby().size()
.
clientcsv.COBERTURA.str.extract('^(\D) ', expand=False).value_counts()
CodePudding user response:
You can create an aux column, with only the first letter.
clientesv['new_label'] = clientesv.COBERTURA.str[0]
group_clients = clientesv.groupby('new_label')['COBERTURA'].agg('sum').reset_index()