Home > Mobile >  Scraping a Table from a Website using Python
Scraping a Table from a Website using Python

Time:07-20

I have tried several ways that works for other websites but not for this url.

https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25 Date (e.g. 2022-07-25) should be in the future

I tried

import requests
import lxml.html as lh
import pandas as pd
url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')

But tr_elements is empty It works with url = 'https://www.wunderground.com/dashboard/pws/ISANSA11/table/2021-11-30/2021-11-30/daily' url = 'http://pokemondb.net/pokedex/all' But not with url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25'

I also tried:

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-20'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table1 = soup.find('table', id='hourly-forecast-table')

But table is not found. It works with: url = 'https://www.worldometers.info/coronavirus/' table1 = soup.find('table', id='main_table_countries_today')

In Chrome I used “Ctrl U” and “Ctrl Shift I” to see HTML In url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25' I can see id='hourly-forecast-table' using “Ctrl Shift I” but not “Ctrl U”. I can not see neither in the code in soup variable. In url = 'https://www.worldometers.info/coronavirus/' I see id='main_table_countries_today' using also “Ctrl U” I guess there is something different in this website.

Thank you very much,

CodePudding user response:

have you tried using this with Selenium as well as Beautiful Soup? Get Selenium and Chromedriver and you can use it to replicate the keystrokes you use like "Ctrl U" using Selenium's send_key function.

CodePudding user response:

If you like the method selenium with pandas,then the next example is for you. I use selenium with pandas to grab the table data because it's loaded by JavaScript.

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
table=driver.get('https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-20')

table = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '(//table)[1]'))).get_attribute("outerHTML")

df = pd.read_html(table)[0]
print(df.iloc[:-1])

Output:

     Time    Conditions  Temp. Feels Like  ... Dew Point Humidity         Wind   Pressure
0   12:00 am         Clear  78 °F      78 °F  ...     73 °F    87 °%   3 °mph NNE  30.02 °in  
1    1:00 am         Clear  77 °F      77 °F  ...     72 °F    85 °%   3 °mph NNW  30.02 °in  
2    2:00 am         Clear  77 °F      81 °F  ...     71 °F    81 °%   3 °mph NNW  30.02 °in  
3    3:00 am         Clear  77 °F      81 °F  ...     70 °F    79 °%     3 °mph N  30.02 °in  
4    4:00 am         Clear  76 °F      80 °F  ...     69 °F    78 °%     3 °mph N  30.01 °in  
5    5:00 am         Clear  76 °F      79 °F  ...     67 °F    76 °%   4 °mph NNW  30.01 °in  
6    6:00 am         Clear  75 °F      77 °F  ...     66 °F    74 °%     5 °mph N  30.02 °in  
7    7:00 am         Sunny  75 °F      76 °F  ...     67 °F    76 °%     4 °mph N  30.03 °in  
8    8:00 am         Sunny  77 °F      81 °F  ...     68 °F    73 °%   6 °mph NNE  30.05 °in  
9    9:00 am         Sunny  80 °F      84 °F  ...     69 °F    69 °%    7 °mph NE  30.06 °in  
10  10:00 am         Sunny  81 °F      87 °F  ...     71 °F    71 °%   9 °mph ENE  30.08 °in  
11  11:00 am         Sunny  82 °F      88 °F  ...     72 °F    72 °%    11 °mph E  30.09 °in  
12  12:00 pm         Sunny  82 °F      88 °F  ...     72 °F    72 °%    12 °mph E  30.10 °in  
13   1:00 pm         Sunny  82 °F      88 °F  ...     71 °F    70 °%  12 °mph ESE  30.10 °in  
14   2:00 pm         Sunny  83 °F      88 °F  ...     71 °F    68 °%  12 °mph ESE  30.10 °in  
15   3:00 pm         Sunny  82 °F      88 °F  ...     71 °F    68 °%  12 °mph ESE  30.09 °in  
16   4:00 pm  Mostly Sunny  83 °F      88 °F  ...     71 °F    68 °%  12 °mph ESE  30.09 °in  
17   5:00 pm  Mostly Sunny  82 °F      88 °F  ...     71 °F    70 °%  11 °mph ESE  30.08 °in  
18   6:00 pm         Sunny  82 °F      87 °F  ...     71 °F    70 °%  10 °mph ESE  30.07 °in  
19   7:00 pm  Mostly Sunny  81 °F      87 °F  ...     71 °F    72 °%   9 °mph ESE  30.07 °in  
20   8:00 pm         Sunny  80 °F      85 °F  ...     71 °F    73 °%   8 °mph ESE  30.07 °in  
21   9:00 pm         Sunny  80 °F      84 °F  ...     71 °F    76 °%     7 °mph E  30.08 °in  
22  10:00 pm         Clear  79 °F      83 °F  ...     71 °F    77 °%     5 °mph E  30.09 °in  
23  11:00 pm         Clear  78 °F      82 °F  ...     71 °F    79 °%   3 °mph ENE  30.10 °in  

[24 rows x 11 columns]
  • Related