I currently working in Selenium with Python and I am scraping a bunch of URLs from a list. My problem comes when I want to separate the responses in lists depending on which URL it was scraped from. Currently the response is just one list of all responses as follows:
BIOGEN_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (50.7 kB)
BIOSPH_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (1.4 MB)
COUNEUR_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (35.8 kB)
CNTRYPK_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (183.2 kB)
GCR_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (3.8 MB)
LNR_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (243.5 kB)
Here is the snippet of code for the first x
results that produces the mentioned result:
for dataset in dataset_index[:3]:
driver.get(dataset['dataset_link'])
time.sleep(2)
filelist = driver.find_elements(By.XPATH, '//*[@id="filelist"]')
for files in filelist:
file_name = files.find_element(By.CLASS_NAME, 'c-download-item').text
print(file_name)
My expected response would be something like:
[BIOGEN_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (50.7 kB)
BIOSPH_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (1.4 MB)],
[COUNEUR_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (35.8 kB)
CNTRYPK_SCOTLAND_ESRI.zip
Format: ESRI Shapefile, (183.2 kB)]
Your assistance will be highly appreciated
CodePudding user response:
You can use list comprehension:
filelist = [x.find_element(By.CLASS_NAME, 'c-download-item').text for x in driver.find_elements(By.XPATH, '//*[@id="filelist"]')]
print(filelist)