I want to count all the cells in a column of a CSV file that contain an exact string, ideally using pandas if possible and using Python. I do not want to include cells that contain the string but also contain other characters, and I need it to be case insensitive.
I'm currently using df['columnName'].str.contains('someString', case=False).sum()
but it counts cells that includes other text as well. I feel like this should be a commonly used function, but I can't seem to find anything that fits what I'm looking for on the web, so any suggestions would be much appreciated.
Example:
If my CSV file had something like this:
Row | Fruit |
---|---|
1 | Apple |
2 | Apples |
3 | Apple and Orange |
4 | aPpLe |
5 | Banana |
And I wanted to count the number of times "Apple" is in the Fruit column, I want it to output 2, since row 1 and row 4 satisfy my condition that only "Apple" is in the cell and it is case insensitive. I do not want it to count row 3 because the row is not exactly just "Apple", and obviously row 2 and 5 do not match the string "Apple" and should also not be counted.
CodePudding user response:
Try str.lower
and value_counts
:
df['Fruit'].str.lower().value_counts()
Output:
apple 2
apples 1
banana 1
apple and orange 1
Name: Fruit, dtype: int64
Update:
To get a specific value use:
df['Fruit'].str.lower().value_counts()['apple']
Or:
>>> df['Fruit'].str.contains('^apple$', case=False).sum()
2
>>>