I have a list (with dictionaries inside) and I want to know how many different domains are inside it.
I have something like this:
list = [
{'url': 'https://stackoverflow.com/questions', 'number': 10},
{'url': 'https://stackoverflow.com/users', 'number': 40},
{'url': 'https://stackexchange.com/tour', 'number': 40},
{'url': 'https://stackexchange.com/whatever/whatever', 'number': 25}
]
The desired result would look like this:
unique_domains = [
{'url': 'https://stackoverflow.com'},
{'url': 'https://stackexchange.com'}
]
Or maybe just:
unique_domains = ['stackoverflow.com', 'stackexchange.com']
Both would be OK, so whatever is easier or faster I guess.
I think I could use Regex for this, but maybe there are more pythonic and/or efficient ways to do this?
Thanks!
CodePudding user response:
You can use urllib.parse.urlparse
(from standard library) together with set comprehension (to avoid duplicates):
from urllib.parse import urlparse
unique_domains = {urlparse(item['url']).netloc for item in given_list}
If you need, you can convert set
to list
via list(unique_domains)
. This is more reliable than regex solution.
(please don't call variable list
, it shadows useful builtin).