Home > Software design >  Get a count of occurrence of string in each row and column of pandas dataframe
Get a count of occurrence of string in each row and column of pandas dataframe

Time:02-23

import pandas as pd
  
# list of paragraphs from judicial opinions
# rows are opinions
# columns are paragraphs from the opinion
opinion1 = ['sentenced to life','sentenced to death. The sentence ...','', 'sentencing Appellant for a term of life imprisonment']
opinion2 = ['Justice Smith','This concerns a sentencing hearing.', 'The third sentence read ...', 'Defendant rested.']
opinion3 = ['sentence sentencing sentenced','New matters ...', 'The clear weight of the evidence', 'A death sentence']
data = [opinion1, opinion2, opinion3]
df = pd.DataFrame(data, columns = ['p1','p2','p3','p4'])

# This works for one column. I have 300  in the real data set.
df['p2'].str.contains('sentenc')

How do I determine whether 'sentenc' is in columns 'p1' through 'p4'?

Desired output would be something like:

True True False True
False True True False
True False False True

How do I retrieve a count of the number of times that 'sentenc' appears in each cell?

Desired output would be a count for each cell of the number of times 'sentenc' appears:

1 2 0 1
0 1 1 0
3 0 0 1

Thank you!

CodePudding user response:

Use pd.Series.str.count:

counts = df.apply(lambda col: col.str.count('sentenc'))

Output:

>>> counts
   p1  p2  p3  p4
0   1   2   0   1
1   0   1   1   0
2   3   0   0   1

To get it in boolean form, use .str.contains, or call .astype(bool) with the code above:

bools = df.apply(lambda col: col.str.contains('sentenc'))

or

bools = df.apply(lambda col: col.str.count('sentenc')).astype(bool)

Both will work just fine.

  • Related