I'm trying to develop a little script in Python but I've never used this language.
I have a structure that is something like this:
dict = [
{ "id" : 1,
"people" : [ { "name" : "Sarah",
"surename" : "something"
},
{ "name" : "Luke",
"surename" : "something"
},
{ "name" : "Chris",
"surename" : "something"
}
]
},
{ "id" : 2,
"people" : [ { "name" : "Jhon",
"surename" : "something"
},
{ "name" : "Luke",
"surename" : "something"
},
{ "name" : "Ronald",
"surename" : "something"
}
]
}
]
and I have another list of values, such as name_list = ["Sarah", "Luke"]
.
I need to find all the IDs of the structure such that all the names in the name_list
are present inside the list of dictionaries people
.
I've tried something like this but this does not work.
for person in dict:
if all(name_list in p["name"] for p in person["people"]):
# Do something with person["id"]
It is important to me to find all the IDs of the list of dictionaries that contains all the names the name_list.
CodePudding user response:
ids = [item['id'] for item in d if all(name in [person['name'] for person in item['people']] for name in name_list)]
print(ids)
>>> [1]
CodePudding user response:
First of all, dict
is a built-in function and shadowing it could break something for you later (e.g. isinstance(obj, dict)
won't work).
Assuming your input is stored in data
variable:
result = [
d["id"]
for d in data
if not set(name_list) - {p['name'] for p in d['people']}
]
Here {p['name'] for p in d['people']}
is a set comprehension
. The result would be a unique set of names for people
struct, e.g {Sarah, Luke, Ronald}
.
Then we use the set difference to find which names are not in the name_list
: set(name_list) - {Sarah, Luke, Ronald} == set()
.
The difference will be equal to an empty set if all names from the name_list are in the people
struct.
So our goal is to find structs in which this difference will be an empty set.
set
arithmetic will be much faster than using in
operator with list.
In [1]: %timeit [item['id'] for item in d if all(name in [person['name'] for person in item['people']] for name in name_list)]
1.43 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [2]: %timeit [d["id"] for d in data if not set(name_list) - {p['name'] for p in d['people']}]
869 ns ± 2.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)