Is there a way to combine a column and a row to make a new column where a value will be stored? I have a dataframe below that I would like to format to achieve the output at the bottom. Any help would be appreciated. The following table is extracted using beautifulsoup, pandas and selenium. I'll use the sixth df as an example.
Current code
s = Service(r'C:\Users\user34\Documents\Python\chromedriver.exe')
driver = webdriver.Chrome(service=s)
url = "http://ets.aeso.ca/ets_web/ip/Market/Reports/CSDReportServlet"
driver.maximize_window()
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(soup))
df[6] #output below
GENERATION GENERATION.1 GENERATION.2 GENERATION.3
0 GROUP MC TNG DCR
1 GAS 10884 7638 59
2 HYDRO 894 103 229
3 ENERGY STORAGE 70 0 56
4 SOLAR 1088 558 0
5 WIND 2780 170 0
6 OTHER 444 279 0
7 DUAL FUEL 0 0 0
8 COAL 1286 819 0
9 TOTAL 17446 9567 344
I would like to drop the column headers and the group value. The desired output is:
GAS MC GAS TNG GAS DCR HYDRO MC HYDRO TNG HYDRO DCR
10884 7638 59 894 103 229
Then extended for the rest of the data, I couldn't fit the whole desired output. Help with this would be greatly appreciated.
CodePudding user response:
Probably something like this (without the overheads of selenium) would be what you're after?
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'http://ets.aeso.ca/ets_web/ip/Market/Reports/CSDReportServlet'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
table = soup.select('table[border="1"]')[1]
df = pd.read_html(str(table), skiprows=1, index_col=['GROUP'])[0]
s = df.stack()
result_df = pd.DataFrame([s.values], columns=[f'{i} {j}' for i, j in s.index])
print(result_df)
Result in terminal:
GAS MC | GAS TNG | GAS DCR | HYDRO MC | HYDRO TNG | HYDRO DCR | ENERGY STORAGE MC | ENERGY STORAGE TNG | ENERGY STORAGE DCR | SOLAR MC | SOLAR TNG | SOLAR DCR | WIND MC | WIND TNG | WIND DCR | OTHER MC | OTHER TNG | OTHER DCR | DUAL FUEL MC | DUAL FUEL TNG | DUAL FUEL DCR | COAL MC | COAL TNG | COAL DCR | TOTAL MC | TOTAL TNG | TOTAL DCR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10884 | 7677 | 59 | 894 | 174 | 229 | 70 | 0 | 56 | 1088 | 509 | 0 | 2780 | 203 | 0 | 444 | 276 | 0 | 0 | 0 | 0 | 1286 | 809 | 0 | 17446 | 9648 | 344 |
Pandas documentation: https://pandas.pydata.org/docs/
Also requests: https://requests.readthedocs.io/en/latest/
And for BeautifulSoup: https://beautiful-soup-4.readthedocs.io/en/latest/