Home > Enterprise >  how to write a program that analyzes strings, identifies the hashtags, counts them and adds them to
how to write a program that analyzes strings, identifies the hashtags, counts them and adds them to

Time:11-01

sample input

[
    "hi #weekend",
    "good morning #madrid #fun",
    "spend my #weekend in #madrid",
    "#madrid <3"
]

expected output

{'weekend': 2, 'madrid': 3, 'fun': 1}

Rules: the program shouldn't consider an empty hashtag ("#") as one

hashtags not starting with a letter shouldn't be considered hashtags

it should consider lower and upper case hashtags as different hashtags

this is what I have so far. my goal is to include the rules in the program

from collections import Counter


def analyze(posts):
    counter = Counter(
        x[1:] for x in ' '.join(posts).split() if x.startswith('#')
    )
    return dict(counter)


posts = [
    "hi #weekend",
    "good morning #madrid #fun",
    "spend my #weekend in #madrid",
    "#madrid <3"]


print(analyze(posts))




CodePudding user response:

Try this:

a = [
    "hi #weekend",
    "good morning #madrid #fun",
    "spend my #weekend in #madrid",
    "#madrid <3"
]

my_dict = {}

for j in a:
    for i in j.split():
        if i.startswith("#") and i[1].isalpha():
            if i[1:] in my_dict:
                my_dict[i[1:]]  = 1
            else:
                my_dict[i[1:]] = 1
print(my_dict)

Output:

{'weekend': 2, 'madrid': 3, 'fun': 1}

CodePudding user response:

Given the condition that the hashtag should start with a letter I suggest using regex to extract all hashtags starting with a letter:

import re
from collections import Counter

def analyze(posts):
    hits = re.findall('#[A-Za-z] [A-Za-z0-9]*', ' '.join(data))
    return Counter([i[1:] for i in hits])

CodePudding user response:

You can use collections.Counter and a simple comprehension:

l = [
    "hi #weekend",
    "good morning #madrid #fun",
    "spend my #weekend in #madrid",
    "#madrid <3"
]

from collections import Counter
counts = Counter(w[1:] for w in ' '.join(l).split()
                 if len(w)>1 and w.startswith('#') and w[1].isalpha())

output:

>>> counts
Counter({'weekend': 2, 'madrid': 3, 'fun': 1})

# as dictionary
>>> dict(counts)
{'weekend': 2, 'madrid': 3, 'fun': 1}
  • Related