Home > Software design >  Counting number of vertical bars | in pandas string has strange behaviour?
Counting number of vertical bars | in pandas string has strange behaviour?

Time:04-07

I want to count the number of instances of vertical bars "|" in a each row of a particular column in a pandas dataframe. But using str.count("|") yields some strange behaviour:

import pandas as pd
df = pd.DataFrame(['some text', None, 'a|few|vertical|bars', 'one|'])
df[0].str.count("|")

outputs

0    10.0
1     NaN
2    20.0
3     5.0
Name: 0, dtype: float64

What's going on here? If I use apply instead, I get the expected answer:

df.apply(lambda x: str(x[0]).count("|"),axis=1)

yields

0    0
1    0
2    3
3    1
dtype: int64

CodePudding user response:

Try this, pat is a regex string and | is a regex operator, OR, so escape with '\', blackslash:

df[0].str.count('\|')

Output:

0    0.0
1    NaN
2    3.0
3    1.0
Name: 0, dtype: float64

Note: str.count in the standard library is different from pd.Series.str.count where the former doesn't use regex, but the method from pandas does per docs linked above.

CodePudding user response:

Looks like str(x[0]) converts the None into the actual string 'None' of type str - losing its NoneType. So when count() encounters a string is simply does a count - when it encounters a NoneType it returns NaN.

df[0].apply(lambda x: type(x))

0         <class 'str'>
1    <class 'NoneType'>
2         <class 'str'>
3         <class 'str'>
Name: 0, dtype: object

df.apply(lambda x: type(str(x[0])), axis=1)

0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
3    <class 'str'>
dtype: object
  • Related