I am writing a code that takes the URLs from an only URL column of a df (list_links
) and saves the main domains in a list (links
). Converting them from "https: //www .theguardian .com/us-news /etc" to "www.theguardian.com" for further analysis. But I can't make the iteration work to convert the next link in list_links
.
f=0
clinks=[]
if f<=len(list_links):
for l in list_links:
domain = urlparse(list_links[f]).netloc
clinks.append(domain)
f=f 1
clinks
It gets stuck on list_links[0].
['www.theguardian.com', 'www.theguardian.com', 'www.theguardian.com', 'www.theguardian.com', 'www.theguardian.com', 'www.theguardian.com', 'www.theguardian.com', ...
How can I make the iteration work? Helpp
CodePudding user response:
Your loop is a little odd and can be simplified:
clinks = []
for list_link in list_links:
domain = urlparse(list_link).netloc
clinks.append(domain)
CodePudding user response:
Solution 1: You are using f=f 1
outside the for
loop, make it work inside the loop.
for l in list_links:
domain = urlparse(list_links[f]).netloc
clinks.append(domain)
f = f 1
Solution 2: f
is completely useless if used for the index, because in for
loop l
itself is used as index, go for:
for l in list_links:
domain = urlparse(l).netloc
clinks.append(domain)