I am trying to scrape multiple tables from single webpage, but unable to save it to .csv file. only last table is getting saved below is the code, please suggest
import time
from selenium import webdriver
import pandas as pd
base_url = 'https://uk.insight.com/en_GB/shop/product/2W1F2EA#ABU/HEWLETT-PACKARD-(HP-INC)/2W1F2EA#ABU/HP-ProBook-440-G8--14"--Core-i7-1165G7--16-GB-RAM--1-TB-SSD--UK/'
print('Opening Chrome Browser Automatically in 5 secs')
time.sleep(5)
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(options=options)
driver.get(base_url)
df = pd.read_html(driver.page_source)
df2 = df[4:]
for table in df2:
df = pd.DataFrame(table)
df.to_csv('table.csv',index=False)
i am not aware how to save all dataframe into single .csv as per above only last df is getting saved.
CodePudding user response:
In the Pandas .to_csv() documentation you can use the mode
parameter to append data instead of overwrite. It is set to 'w' as default.
If you want to append data you can switch the mode to "a"
df.to_csv('table.csv', mode='a', index=False)
One thing to note is that is the column names will also be appended unless you set header = False
Here is a quick reproduceable example.
import uuid
import pandas as pd
dataframe = pd.DataFrame({
"person_id": [str(uuid.uuid4())[:7] for _ in range(6)],
"hours_worked": [38.5, 41.25, "35.0", 27.75, 22.25, -20.5],
"wage_per_hour": [15.1, 15, 21.30, 17.5, 19.50, 25.50],
})
dataframe2 = pd.DataFrame({
"person_id2": [str(uuid.uuid4())[:7] for _ in range(6)],
"hours_worked2": [38.5, 41.25, "35.0", 27.75, 22.25, -20.5],
"wage_per_hour2": [15.1, 15, 21.30, 17.5, 19.50, 25.50],
})
dataframe.to_csv('TEST.csv', mode='w', index=False)
dataframe2.to_csv('TEST.csv', mode='a', index = False, header=False)
print(pd.read_csv('TEST.csv'))
OUTPUT
person_id hours_worked wage_per_hour
0 1aa66bc 38.50 15.1
1 b7abe05 41.25 15.0
2 15e1779 35.00 21.3
3 3c117d7 27.75 17.5
4 2e6494e 22.25 19.5
5 2a25e45 -20.50 25.5
6 b17d084 38.50 15.1
7 6ca361e 41.25 15.0
8 2cd18e4 35.00 21.3
9 9d120ff 27.75 17.5
10 a0b20d9 22.25 19.5
11 bf9a98d -20.50 25.5