Fairly new to Python. I'm working on making a code more eloquent by trying to minimally write a nested for loop within a nested for
loop to ultimate create a dictionary, where the dictionary includes the words as key and the frequency of words as values in a file. I believe I figured out how to do the inner for
loop using dictionary comprehension but am having trouble figuring out the syntax for the outer for
loop. I am guessing the outer for
loop would be set up as a list comprehension expression. Currently I am not going to worry about what type of character is being considered a word (symbol, number, alphabet), and am trying to avoid importing any additional libraries. Could you maybe show me some examples, or point me to a resource I could read up more into nested comprehensions/advanced comprehensions?
The "brute force" fundamental method I originally developed looks along the lines of this:
word_cache = {}
# Some code here
with open('myfile.txt') as lines:
for line in lines:
for word in line.split():
word_cache[word]=word_cache.get(word,0) 1
'''
Below is alternatively what I have for dictionary comprehension.
The "for line in lines" is what I am having difficulty trying to nest which I believe would replace the "line in the dictionary comprehension". Part of the issue I see is lines is considered a file object.
'''
word_cache.update({word:word_cache.get(word,0) 1 for word in line.split()})
# Tried the below but did not work because this is the (line for line in lines) is a generator expression
word_cache.update({word:word_cache.get(word,0) 1 for word in (line for line in lines).split()})
Could someone help me understand what is the correct syntax for nested comprehensions of file objects (assuming the object file comes from a txt file)?
CodePudding user response:
Just put the for loops one after another:
{word: word_cache.get(word,0) 1 for word in line .split() for line in lines}
See the last example of PEP 274
CodePudding user response:
A comprehension won't work in this case as you are relying on the container to reference itself. You will get a NameError
as word_cache
won't have been defined yet.
Your original code is something like this
# initialising the dict
word_cache = {}
with open('myfile.txt') as lines:
for line in lines:
for word in line.split():
# referencing the dict that has been initialised
word_cache[word] = word_cache.get(word, 0) 1
What you might want to try is something like this
with open('myfile.txt') as lines:
word_cache = {word: word_cache.get(word, 0) 1 for line in lines for word in line.split()}
This won't work because comprehensions create the object first and then perform assignment second. Therefore, when you use word_cache.get
, Python has no idea what you're referring to as word_cache
hasn't been created yet!
e.g.
In [1]: a = [a[0] i for i in range(3)]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-3186711e5c1b> in <module>
----> 1 a = [a[0] i for i in range(3)]
<ipython-input-1-3186711e5c1b> in <listcomp>(.0)
----> 1 a = [a[0] i for i in range(3)]
NameError: name 'a' is not defined
Consider using a Counter
from collections.
In [1]: from collections import Counter
In [2]: with open('/path/to.file') as f:
...: words = c.Counter(f.read().split())
...:
It's important to use the right tools for the job. In this case, it's a Counter
.
More importantly, who is saying that your initial solution is not elegant or straightforward? A comprehension doesn't make a solution more elegant or readable.