I would like to automatically download different datasets from the Climate World Bank (
This is an example of a dataset to download.
However, I have two major problems:
- I am not able to select the values from the drop-down menu if a change to the timeseries tab
- I do not know how to select the values for subnational unit since it exist only after sub-national units is chosen for area type.
This is the code that I have written until now
from multiprocessing import Value
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
wd = webdriver.Chrome('C:/Users/alber/OneDrive/Desktop/UniTn/WebD/chromedriver_win32/chromedriver.exe')
url = "https://climateknowledgeportal.worldbank.org/download-data"
wd.get(url)
download_b = wd.find_element(By.ID,'ncfile')
tab = wd.find_element(by = By.XPATH , value = '//*[@id="data-download-form-container"]/div/ul/li[3]/a')
tab.click()
#WebDriverWait(wd, 15).until(EC.presence_of_element_located((By.ID, "variable")))
select = Select(wd.find_element(by = By.ID, value = "variable"))
select.select_by_visible_text("Mean-Temperature")
select = Select(wd.find_element(by = By.ID, value = "aggregation"))
select.select_by_visible_text("Monthly")
select = Select(wd.find_element(by = By.ID, value = "type"))
select.select_by_visible_text("Sub-national units")
select = Select(wd.find_element(by = By.ID, value = "country"))
select.select_by_visible_text("Italy")
select = Select(wd.find_element(by = By.ID, value = "timeperiod"))
select.select_by_visible_text("1901 - 2021")
download_b.click()
Thank you for your help.
CodePudding user response:
Given your ultimate goal (to obtain the actual data - csv and analyse it), you may want to reconsider your strategy. You're trying to fill a form with some variables, and that form is posting those variables somewhere (to a URL you can observe in Dev tools - Network tab), and returning a result. Why not use requests for this, and avoid the overheads of selenium? The following is one way of obtaining such data, using only requests (and pandas for displaying the data):
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
s = requests.Session()
s.headers.update(headers)
url = 'https://climateknowledgeportal.worldbank.org/download-data'
r = s.get(url)
s.headers.update({'X-Requested-With':'XMLHttpRequest'})
payload = {
'collection':"cru",
'variable':"tas",
'aggregation':"annual",
'type':"country",
'country':"DZA",
'subnational':"",
'latitude':"",
'longitude':"",
'watershed':"",
'calculation':"",
'timeperiod':"historical",
'percentile':"",
'scenario':"",
'model':"all",
'tab':"timeseries"
}
r = s.post('https://climateknowledgeportal.worldbank.org/download_climateportal_data', data=payload)
print(r.json()['success'])
r = s.get(r.json()['success'])
with open('test_climate.csv', "wb") as f:
f.write(r.content)
df = pd.read_csv('test_climate.csv')
display(df)
The code above is visiting the original page (to get the cookies), then posting some data (see payload object) to the api accessed by the actual form in page. That api is returning a url containing the actual data, and then we visit that url and write the data to a csv file, which we then read with pandas, and the result (also saved in a csv file) is:
Variable: tas
NaN Algeria Adrar Ain-Defla Ain-Temouchent Alger Annaba Batna Bechar Bejaia Biskra Blida Bordj Bou Arrer Bouira Boumerdes Chlef Constantine Djelfa El Bayadh El Oued El-Tarf Ghardaia Guelma Illizi Jijel Khenchela Laghouat Mascara Medea Mila Mostaganem M'Sila Naama Oran Ouargla Oum El Bouaghi Relizane Saida Setif Sidi Bel Abbes Skikda Souk-Ahras Tamanrasset Tebessa Tindouf Tiaret Tipaza Tissemsilt Tizi Ouzou Tlemcen
1901.0 22.84 25.83 16.38 16.89 16.78 16.36 14.89 22.65 14.94 19.28 15.91 14.58 14.59 16.16 17.22 14.28 16.46 18.19 21.18 16.36 20.93 14.78 22.65 15.38 15.88 16.83 16.63 14.98 14.46 17.46 15.93 16.16 17.14 22.04 13.94 17.37 15.17 14.26 15.34 15.90 14.51 25.02 15.48 24.38 15.24 17.06 15.32 15.34 15.25
1902.0 22.84 25.77 16.64 16.98 17.06 16.60 15.13 22.44 15.24 19.50 16.18 14.87 14.87 16.46 17.47 14.53 16.71 18.28 21.35 16.61 21.06 15.00 22.63 15.68 16.07 17.06 16.78 15.24 14.71 17.66 16.18 16.17 17.30 22.17 14.16 17.61 15.32 14.53 15.43 16.17 14.73 25.05 15.68 23.98 15.47 17.32 15.57 15.65 15.27
1903.0 22.75 25.83 16.34 16.83 16.75 16.08 14.70 22.36 14.85 19.08 15.86 14.46 14.48 16.12 17.21 14.11 16.33 18.00 20.93 16.11 20.76 14.50 22.54 15.23 15.63 16.70 16.53 14.90 14.31 17.43 15.78 15.99 17.13 21.84 13.71 17.34 15.06 14.12 15.22 15.73 14.22 25.09 15.19 23.92 15.17 17.03 15.28 15.29 15.15
1904.0 22.89 25.89 16.82 17.25 17.22 16.53 15.11 22.61 15.34 19.49 16.35 14.92 14.99 16.62 17.69 14.50 16.77 18.41 21.27 16.50 21.09 14.95 22.62 15.70 15.97 17.13 16.97 15.40 14.71 17.89 16.24 16.39 17.60 22.12 14.09 17.80 15.49 14.55 15.66 16.18 14.65 25.09 15.58 23.95 15.63 17.52 15.76 15.80 15.55
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2017.0 23.87 26.65 18.27 18.83 18.69 18.13 16.73 23.94 16.88 21.07 17.77 16.49 16.44 18.06 19.15 16.12 18.21 19.87 22.93 18.06 22.31 16.51 23.44 17.19 17.62 18.60 18.46 16.78 16.31 19.36 17.75 17.98 19.04 23.58 15.72 19.27 17.01 16.14 17.24 17.73 16.23 25.52 17.23 25.41 17.09 18.99 17.19 17.25 17.15
2018.0 23.50 26.13 17.66 17.96 18.19 18.02 16.37 22.80 16.32 20.63 17.25 15.93 15.84 17.49 18.58 15.98 17.65 19.00 22.77 18.02 21.79 16.41 23.56 16.84 17.52 17.95 17.82 16.20 15.94 18.80 17.20 16.95 18.44 23.30 15.60 18.71 16.27 15.63 16.35 17.57 16.16 25.63 17.14 24.11 16.42 18.43 16.58 16.67 16.22
2019.0 23.66 26.47 17.83 18.36 18.36 18.06 16.51 23.39 16.53 20.79 17.43 16.15 16.07 17.70 18.76 16.03 17.82 19.39 22.81 18.02 21.99 16.44 23.43 16.98 17.54 18.16 18.05 16.38 16.09 18.97 17.41 17.45 18.67 23.41 15.63 18.88 16.55 15.82 16.76 17.63 16.18 25.51 17.15 24.71 16.63 18.62 16.73 16.90 16.67
2020.0 23.79 26.57 18.08 18.65 18.63 18.19 16.79 23.66 16.91 21.13 17.68 16.51 16.41 18.00 19.00 16.19 18.13 19.69 23.02 18.15 22.30 16.59 23.49 17.23 17.72 18.40 18.33 16.65 16.36 19.27 17.77 17.80 18.97 23.66 15.80 19.11 16.82 16.17 17.07 17.79 16.32 25.48 17.31 24.92 16.86 18.87 16.98 17.25 16.97
2021.0 23.93 26.62 18.29 18.62 18.87 18.59 17.16 23.62 17.22 21.46 17.92 16.81 16.70 18.32 19.11 16.59 18.41 19.78 23.40 18.61 22.46 16.99 23.74 17.59 18.11 18.67 18.41 16.92 16.73 19.31 18.08 17.78 18.97 23.98 16.20 19.22 16.92 16.50 17.08 18.17 16.74 25.63 17.71 24.86 17.07 19.05 17.20 17.53 16.94
You can modify the parameters of payload object, to get all the results you want.
Requests docs: https://requests.readthedocs.io/en/latest/
And also pandas: https://pandas.pydata.org/pandas-docs/stable/index.html