I'm fairly new to coding, and I'm completely new to regex.
How do I use regex to read through a text file and set each unique word as a key and have the value be how many times that word (not case sensitive) shows up in the file? It also needs to be able to find words that are mixed in with characters, for example: "364Hey@ 8friend99%^" would output 'hey' and 'friend' as the keys.
For example, if my txt file is:
Hello, my name is 423Jeff. My 6name34 @@is 4 letters long.
My desired outcome would be something like:
{'hello': 1, 'my': 2, 'name': 2, 'is': 2, 'jeff': 1, 'letters': 1, 'long': 1}
CodePudding user response:
You don't need Regex. This solves the problem
f = open("demofile.txt", "r")
d = {}
for word in f.read().split():
word = word.lower()
word = ''.join(e for e in word if e.isalpha())
if word is not '':
if word not in d:
d[word] = 1
else:
d[word] = 1
print(d)
CodePudding user response:
Good luck with your programming classes! You did a good job stating the problem and the desire output. In the future, consider posting your code as well so that the community can help you with your technique. Otherwise, it looks like you are just getting folks to do your homework.
Anyway, it looked like a fun problem so here you go:
#!/usr/bin/env python3
import re
from collections import defaultdict
text = "Hello, my name is 423Jeff. My 6name34 @@is 4 letters long. "
count = defaultdict(int)
words = re.split("[^A-Za-z] ", text)
for word in words:
if len(word) > 0:
count[ word.lower() ] = 1
print( dict( count ))