I have a csv file containing many links. My goal is to scrape through them all. I'm trying to open them using for loop but my array from csv file looks like this [['www.example.com'], '[www.google.com']].
I think this is the problem that causes error: AttributeError: 'list' object has no attribute 'timeout'
Because when I tried to use this list - data = ["https://www.google.com/", "https://www.bbc.co.uk/"] it worked.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import csv
import urllib.request
from bs4 import BeautifulSoup
import numpy as np
with open('neduplikuotas.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
#data = ["https://www.google.com/", "https://www.bbc.co.uk/"]
#print(data)
for element in data:
html = urllib.request.urlopen(element)
htmlParse = BeautifulSoup(html, 'html.parser')
for paragraph in htmlParse.find_all("p"):
print(paragraph.get_text())
CodePudding user response:
If your code works with data = ["https://www.google.com/", "https://www.bbc.co.uk/"]
but you want it to work with data = [['www.example.com'], '[www.google.com']]
, you can simply select the first element in the list that has added:
for element in data:
html = urllib.request.urlopen(element[0])
htmlParse = BeautifulSoup(html, 'html.parser')
for paragraph in htmlParse.find_all("p"):
print(paragraph.get_text())
CodePudding user response:
Yes that way. And I just found how to make these elements not in bracket so I can do both ways now. Thank you.
data = [''.join(ele) for ele in data1]