Home > Mobile >  Count number of consecutive spaces in Series
Count number of consecutive spaces in Series

Time:06-03

I have a Series like:

import pandas as pd

ser = pd.Series([
    'the quick brown fox',
    'the  quick pink fox',
    'a quick brown   fox',
    'the jumpy  brown fox    ',
    'the quick  brown animal',
])

I would like to count the number of consecutive spaces in each element. So my expected output is:

0    1
1    2
2    3
3    4
4    2
dtype: int64

because the first row contains only one consecutive space, the second one contains two consecutive spaces (between the and quick), the third row contains three consecutive spaces (between brown and fox), and so on...

I know of ser.str.count(' '), but that'll give me the total number of spaces, even if they're not consecutive

CodePudding user response:

You can extract all consecutive spaces with a regex (using str.extractall), then get the lengths with str.len and find the maximum length per initial row with GroupBy.max:

(ser
 .str.extractall('(\s )')[0]
 .str.len()
 .groupby(level=0).max()
 .reindex(ser.index, fill_value=0) # optional (see below)
)

NB. If there is a possibility that you have strings without space and you would like to get 0, you need to reindex.

output:

0    1
1    2
2    3
3    4
4    2
Name: 0, dtype: int64

CodePudding user response:

findall gets you a list of spaces strings, just take the length of the longest string per list:

ser.str.findall('  ').apply(lambda s: max(map(len, s)) if s else 0)

Result:

0    1
1    2
2    3
3    4
4    2
dtype: int64
  • Related