I have a Series like:
import pandas as pd
ser = pd.Series([
'the quick brown fox',
'the quick pink fox',
'a quick brown fox',
'the jumpy brown fox ',
'the quick brown animal',
])
I would like to count the number of consecutive spaces in each element. So my expected output is:
0 1
1 2
2 3
3 4
4 2
dtype: int64
because the first row contains only one consecutive space, the second one contains two consecutive spaces (between the
and quick
), the third row contains three consecutive spaces (between brown
and fox
), and so on...
I know of ser.str.count(' ')
, but that'll give me the total number of spaces, even if they're not consecutive
CodePudding user response:
You can extract all consecutive spaces with a regex (using str.extractall
), then get the lengths with str.len
and find the maximum length per initial row with GroupBy.max
:
(ser
.str.extractall('(\s )')[0]
.str.len()
.groupby(level=0).max()
.reindex(ser.index, fill_value=0) # optional (see below)
)
NB. If there is a possibility that you have strings without space and you would like to get 0, you need to reindex
.
output:
0 1
1 2
2 3
3 4
4 2
Name: 0, dtype: int64
CodePudding user response:
findall gets you a list of spaces strings, just take the length of the longest string per list:
ser.str.findall(' ').apply(lambda s: max(map(len, s)) if s else 0)
Result:
0 1
1 2
2 3
3 4
4 2
dtype: int64