Home > Back-end >  Counting Character Occurrences for Each Pandas Dataframe Record
Counting Character Occurrences for Each Pandas Dataframe Record

Time:09-27

I have a data frame with a row that looks like the following:

Section Title                          ...
==========================================
4.1.1   4.1.1 Requirements allocation. ...
4.1.2   4.1.2 Safety.                  ...
4.1.3   4.1.3 Warnings.                ...

I am trying to count the number of periods (.) in the Section column, so I wrote this line:

df['Subsections'] = df.Section.str.count(".")

However, the subsections column is returning the number 5 rather than the number I would expect for the first record which is 2 since there are two periods (.). Is there some little nuance I am missing here?

CodePudding user response:

By design Series.str.count(pat, flags=0) interpret pat parameter as a regular expression pattern(See the source code). So you need to explicitly escape the . character using \ to literally match with .

>>> df.Section.str.count("\.")
  • Related