I have data in below format.
data = {"policy": {"1": {"ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code"} "2": {"ID": "ML_1", "URL": "www.b.com", "Plain_Text" my name is Mikal and here is my code"}}}
keywords = ['is', 'my']
Here are few things I want to do with my data in python.
First to iterate over my dictionary and to find and count the keywords mentioned above in the value of "Text" both in "1" and "2" and last thing is to update the current dictionary with keywords counts (no of times keywords mentioned in "1" and "2" like below.
{"policy": {"1": {"ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code", "is": "2", "my": "2"} "2": {"ID": "ML_1", "URL": "www.b.com", "Plain_Text: "my name is Mikal and here is my code", "is": "2", "my": "2"}}}
If anyone can help me, would be thankful.
CodePudding user response:
You could use collections.Counter
:
from collections import Counter
import json # Only for pretty printing `data` dictionary.
def get_keyword_counts(text: str, keywords: list[str]) -> dict[str, int]:
return {
word: count for word, count in Counter(text.split()).items()
if word in set(keywords)
}
def main() -> None:
data = {
"policy": {
"1": {
"ID": "ML_0",
"URL": "www.a.com",
"Text": "my name is Martin and here is my code"
},
"2": {
"ID": "ML_1",
"URL": "www.b.com",
"Text": "my name is Mikal and here is my code"
}
}
}
keywords = ['is', 'my']
for policy in data['policy'].values():
policy |= get_keyword_counts(policy['Text'], keywords)
print(json.dumps(data, indent=4))
if __name__ == '__main__':
main()
Output:
{
"policy": {
"1": {
"ID": "ML_0",
"URL": "www.a.com",
"Text": "my name is Martin and here is my code",
"my": 2,
"is": 2
},
"2": {
"ID": "ML_1",
"URL": "www.b.com",
"Text": "my name is Mikal and here is my code",
"my": 2,
"is": 2
}
}
}
Note: Using |=
to merge dicts is a Python 3.10 feature. Should not be hard to google how to do it if you are using an older version.
CodePudding user response:
(I don't have enough reputation to comment so I'm posting as answer sorry) First of all, I think your dict structure is not correct to begin with. Syntax does not seem correct