Im trying a sentiment analysis based approach on youtube comments, but the comments many times have words like mrbeast, tiger/'s, lion/'s, pewdiepie, james, etc which do not add any feeling in the sentence. I've gone through nltk's average_perception_tagger but it didn't work well as it gave the results as
my input:
"mrbeast james lion tigers bad sad clickbait fight nice good"
words that i need in my sentence:
"bad sad clickbait fight nice good"
what i got using average_perception_tagger:
[('mrbeast', 'NN'),
('james', 'NNS'),
('lion', 'JJ'),
('tigers', 'NNS'),
('bad', 'JJ'),
('sad', 'JJ'),
('clickbait', 'NN'),
('fight', 'NN'),
('nice', 'RB'),
('good', 'JJ')]
so as you can see if i remove mrbeast i.e NN the words like clickbait, fight will also get removed which than ultimately remove expressions from that sentence.
CodePudding user response:
okay, this is what i do for companies that report on the LSE. You can do similar with your words.
# define what you consider to be positive, negative or neutral keywords
posKeyWords = ['profit', 'increase', 'pleased', 'excellent', 'good', 'solid financial', 'robust', 'significantly improved', 'improve']
negKeyWords = ['loss', 'decrease', 'dissapoint', 'poor', 'bad','decline', 'negative', 'bad', 'weather', 'covid' ]
neutralKeyWords = ['financial']
keyWords = posKeyWords neutralKeyWords negKeyWords
Next you get data as text (from whatever source you choose). Put the data (words) into a list (array).
dataTest = []
dataText = resp.text # or whatever source you are reading from
Mine is a response from a web query, but yours cour be from a text file or ther source.
Next create an empty dictionary to count key words into a dict (hashing is fast).
keyWordSummary = {} # dictionary of keywords & values
Finally, loop through the keywords and put them into the dict.
# look for some keywords
for kw in keyWords:
kwVal = re.findall(kw, dataText)
#print('keyword count:', kw, len(kwVal))
# put into a dict
keyWordSummary[kw] = len(kwVal)
You now have a list of word frequencies which you could analyse in a dataframe for example (which outside the scope of this particular question).