Home > Software design >  Counting exact (case insensitive) strings in a column of data, excluding cells with string matches p
Counting exact (case insensitive) strings in a column of data, excluding cells with string matches p

Time:10-03

I want to count all the cells in a column of a CSV file that contain an exact string, ideally using pandas if possible and using Python. I do not want to include cells that contain the string but also contain other characters, and I need it to be case insensitive.

I'm currently using df['columnName'].str.contains('someString', case=False).sum() but it counts cells that includes other text as well. I feel like this should be a commonly used function, but I can't seem to find anything that fits what I'm looking for on the web, so any suggestions would be much appreciated.

Example:

If my CSV file had something like this:

Row Fruit
1 Apple
2 Apples
3 Apple and Orange
4 aPpLe
5 Banana

And I wanted to count the number of times "Apple" is in the Fruit column, I want it to output 2, since row 1 and row 4 satisfy my condition that only "Apple" is in the cell and it is case insensitive. I do not want it to count row 3 because the row is not exactly just "Apple", and obviously row 2 and 5 do not match the string "Apple" and should also not be counted.

CodePudding user response:

Try str.lower and value_counts:

df['Fruit'].str.lower().value_counts()

Output:

apple               2
apples              1
banana              1
apple and orange    1
Name: Fruit, dtype: int64

Update:

To get a specific value use:

df['Fruit'].str.lower().value_counts()['apple']

Or:

>>> df['Fruit'].str.contains('^apple$', case=False).sum()
2
>>> 
  • Related