Home > Software design >  Python nested dictionary comprehension with file objects
Python nested dictionary comprehension with file objects

Time:02-14

Fairly new to Python. I'm working on making a code more eloquent by trying to minimally write a nested for loop within a nested for loop to ultimate create a dictionary, where the dictionary includes the words as key and the frequency of words as values in a file. I believe I figured out how to do the inner for loop using dictionary comprehension but am having trouble figuring out the syntax for the outer for loop. I am guessing the outer for loop would be set up as a list comprehension expression. Currently I am not going to worry about what type of character is being considered a word (symbol, number, alphabet), and am trying to avoid importing any additional libraries. Could you maybe show me some examples, or point me to a resource I could read up more into nested comprehensions/advanced comprehensions?

The "brute force" fundamental method I originally developed looks along the lines of this:

word_cache = {}
# Some code here

with open('myfile.txt') as lines:
    for line in lines:
        for word in line.split():
            word_cache[word]=word_cache.get(word,0) 1


    '''        
    Below is alternatively what I have for dictionary comprehension. 
    The "for line in lines" is what I am having difficulty trying to nest which I believe would replace the "line in the dictionary comprehension". Part of the issue I see is lines is considered a file object.
    '''
    word_cache.update({word:word_cache.get(word,0) 1 for word in line.split()})

    # Tried the below but did not work because this is the (line for line in lines) is a generator expression
    word_cache.update({word:word_cache.get(word,0) 1 for word in (line for line in lines).split()})

Could someone help me understand what is the correct syntax for nested comprehensions of file objects (assuming the object file comes from a txt file)?

CodePudding user response:

Just put the for loops one after another:

{word: word_cache.get(word,0)   1 for word in line .split() for line in lines}

See the last example of PEP 274

CodePudding user response:

A comprehension won't work in this case as you are relying on the container to reference itself. You will get a NameError as word_cache won't have been defined yet.

Your original code is something like this

# initialising the dict
word_cache = {}

with open('myfile.txt') as lines:
    for line in lines:
        for word in line.split():
            # referencing the dict that has been initialised
            word_cache[word] = word_cache.get(word, 0)   1

What you might want to try is something like this

with open('myfile.txt') as lines:
    word_cache = {word: word_cache.get(word, 0)   1 for line in lines for word in line.split()}            

This won't work because comprehensions create the object first and then perform assignment second. Therefore, when you use word_cache.get, Python has no idea what you're referring to as word_cache hasn't been created yet!

e.g.

In [1]: a = [a[0]   i for i in range(3)]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-3186711e5c1b> in <module>
----> 1 a = [a[0]   i for i in range(3)]

<ipython-input-1-3186711e5c1b> in <listcomp>(.0)
----> 1 a = [a[0]   i for i in range(3)]

NameError: name 'a' is not defined

Consider using a Counter from collections.

In [1]: from collections import Counter

In [2]: with open('/path/to.file') as f:
   ...:     words = c.Counter(f.read().split())
   ...:

It's important to use the right tools for the job. In this case, it's a Counter.

More importantly, who is saying that your initial solution is not elegant or straightforward? A comprehension doesn't make a solution more elegant or readable.

  • Related