I have a data frame with a row that looks like the following:
Section Title ...
==========================================
4.1.1 4.1.1 Requirements allocation. ...
4.1.2 4.1.2 Safety. ...
4.1.3 4.1.3 Warnings. ...
I am trying to count the number of periods (.) in the Section column, so I wrote this line:
df['Subsections'] = df.Section.str.count(".")
However, the subsections column is returning the number 5 rather than the number I would expect for the first record which is 2 since there are two periods (.). Is there some little nuance I am missing here?
CodePudding user response:
By design Series.str.count(pat, flags=0)
interpret pat
parameter as a regular expression pattern(See the source code). So you need to explicitly escape the .
character using \
to literally match with .
>>> df.Section.str.count("\.")