There are some tables I am trying to get data from. I have been able to do this before when the tags have only tags, however in this specific table they are mixed on several rows:
I was able to extract and tags separately using below, but then it's difficult to put the data back together.
url = 'https://www.pge.com/pipeline/operations/cgt_pipeline_status.page#flows'
res = requests.get(url)
file = BeautifulSoup(res.text, 'lxml')
##################################################################
find_table = file.find('table', class_='supply_demand_table')
rows = find_table.find_all('tr')
lps_td =[]
for i in rows:
table_data = i.find_all('td')
data = [j.text for j in table_data]
lps_td.append(data)
df_td = pd.DataFrame(lps_td)
lps_th =[]
for i in rows:
table_data = i.find_all('th')
data = [j.text for j in table_data]
lps_th.append(data)
df_th = pd.DataFrame(lps_th)
lps_th =[]
Any help on pulling the entire table would be really appreciated. Thanks!
CodePudding user response:
read_html returns a list of dataframes index 5 is the table you want.
import pandas as pd
url = "https://www.pge.com/pipeline/operations/cgt_pipeline_status.page#flows"
df = pd.read_html(url)[5].rename(columns={"Unnamed: 0": ""}).set_index("")