I have a problem to solve. I need to create new columns from given key_words
list and sum their occurrence in data frame.
key_words = ['apple', 'animal', 'everyone']
input data frame:
id | description | xx |
---|---|---|
1 | Apple is a healthy fruit. Everyone should eat it. | .. |
2 | Lion is a denagerous animal. | .. |
3 | Everyone likes him. | .. |
what I want to get:
id | description | xx | apple | animal | everyone |
---|---|---|---|---|---|
1 | Apple is a healthy fruit. Everyone should eat it. | .. | 1 | 0 | 1 |
2 | Lion is a dangerous animal. | .. | 0 | 1 | 0 |
3 | Everyone likes him. | .. | 0 | 0 | 1 |
any help much appreciated.
CodePudding user response:
This will work for you
key_words = ['apple', 'animal', 'everyone']
for key in key_words:
df[key] = df['description'].str.lower().str.count(key)
CodePudding user response:
keys = ['apple', 'animal', 'everyone']
df['apple'], df['animal'], df['everyone'] = (
zip(*list([len(re.findall(f'(?i){k}', r)) for k in keys] for r in df['description']))
)