sample input
[
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"
]
expected output
{'weekend': 2, 'madrid': 3, 'fun': 1}
Rules: the program shouldn't consider an empty hashtag ("#") as one
hashtags not starting with a letter shouldn't be considered hashtags
it should consider lower and upper case hashtags as different hashtags
this is what I have so far. my goal is to include the rules in the program
from collections import Counter
def analyze(posts):
counter = Counter(
x[1:] for x in ' '.join(posts).split() if x.startswith('#')
)
return dict(counter)
posts = [
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"]
print(analyze(posts))
CodePudding user response:
Try this:
a = [
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"
]
my_dict = {}
for j in a:
for i in j.split():
if i.startswith("#") and i[1].isalpha():
if i[1:] in my_dict:
my_dict[i[1:]] = 1
else:
my_dict[i[1:]] = 1
print(my_dict)
Output:
{'weekend': 2, 'madrid': 3, 'fun': 1}
CodePudding user response:
Given the condition that the hashtag should start with a letter I suggest using regex to extract all hashtags starting with a letter:
import re
from collections import Counter
def analyze(posts):
hits = re.findall('#[A-Za-z] [A-Za-z0-9]*', ' '.join(data))
return Counter([i[1:] for i in hits])
CodePudding user response:
You can use collections.Counter
and a simple comprehension:
l = [
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"
]
from collections import Counter
counts = Counter(w[1:] for w in ' '.join(l).split()
if len(w)>1 and w.startswith('#') and w[1].isalpha())
output:
>>> counts
Counter({'weekend': 2, 'madrid': 3, 'fun': 1})
# as dictionary
>>> dict(counts)
{'weekend': 2, 'madrid': 3, 'fun': 1}