I have some lists of text and I need to pull them together into a dictionary, but I need to use one list to 'filter' the other. I can do this in a series of nested for loops but I can not make it work with a dict comprehension.
a = ['Complete Yes', 'Title Mr', 'Forename John', 'Initial A', 'Surname Smith', 'Date of Birth 01 01 1901']
b = ['Forename', 'Surname', 'Date of birth']
If I try to make a dict of the needed details with nested for loops
it works fine
details = {}
for x in b:
for l in a:
if x in l:
details[x] = l
details
I get
{'Forename': 'Forename John',
'Surname': 'Surname Smith',
'Date of birth': 'Date of birth 01 01 1901'}
which needs cleaning up but I can do that later.
When I try it with a dict comprehension
d_tails = {x:l for x,l in zip(b, [l for l in a if x in l]) }
I get
{'Forename': 'Date of birth 01 01 1901'}
I'm sure this is because of how I'm ordering the dict comprehension but I can't figure out how to order it so that it replaces the for loop.
For context I'm trying to clean really messy data for terrible pdfs that where a
comes from. Any help on this would be appreciated.
CodePudding user response:
Let's consider simpler examples of two lists:
a = [1,2,3]
b = ['a', 'b', 'c']
for x in a:
for y in b:
print(x, y)
This produces 9 lines of output, one for every possible combination of a value from a
and a value from b
.
for x, y in zip(a, b):
print(x, y)
This produces only 3 lines of output: one for every corresponding pair of values taking one from a
and one from b
.
If you want to convert your nested loop into a single dict comprehension, you need two generators, not a single generator iterating over a zip
object.
details = {x: l for x in b for l in a if x in l}
CodePudding user response:
If I want to convert such a nested loop into a comprehension, I typically start from the outside.
You already know how the details
dict should look in the end, so I start with that structure, and insert a placeholder ''
value:
details = {x: '' for x in b}
With list b
out of the way, I can now only look at a given x
(say Forename
) and list a
: I observe that in principle there could be multiple matching entries in that list, making it possible to retrieve a list of possibly matching name entries. That corresponds to a filtered list comprehension [l for l in a if x in b]
. Combined:
details = {x: [l for l in a if x in b] for x in b}
But you wanted to have a string, and the most common case being just one match. For that, I recommend using ', '.join(...)
to convert that list of matches back to a string. At the same time, the list comprehension becomes a generator:
details = {x: ', '.join(l for l in a if x in b) for x in b}