I want to count the number of instances of vertical bars "|" in a each row of a particular column in a pandas dataframe. But using str.count("|")
yields some strange behaviour:
import pandas as pd
df = pd.DataFrame(['some text', None, 'a|few|vertical|bars', 'one|'])
df[0].str.count("|")
outputs
0 10.0
1 NaN
2 20.0
3 5.0
Name: 0, dtype: float64
What's going on here? If I use apply
instead, I get the expected answer:
df.apply(lambda x: str(x[0]).count("|"),axis=1)
yields
0 0
1 0
2 3
3 1
dtype: int64
CodePudding user response:
Try this, pat is a regex string and | is a regex operator, OR, so escape with '\', blackslash:
df[0].str.count('\|')
Output:
0 0.0
1 NaN
2 3.0
3 1.0
Name: 0, dtype: float64
Note: str.count
in the standard library is different from pd.Series.str.count
where the former doesn't use regex, but the method from pandas does per docs linked above.
CodePudding user response:
Looks like str(x[0])
converts the None
into the actual string 'None'
of type str
- losing its NoneType
. So when count()
encounters a string is simply does a count - when it encounters a NoneType
it returns NaN
.
df[0].apply(lambda x: type(x))
0 <class 'str'>
1 <class 'NoneType'>
2 <class 'str'>
3 <class 'str'>
Name: 0, dtype: object
df.apply(lambda x: type(str(x[0])), axis=1)
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
dtype: object