Home > OS >  Unable to save Multiple dataframe into single csv ,Only last df is getting saved
Unable to save Multiple dataframe into single csv ,Only last df is getting saved

Time:12-08

I am trying to scrape multiple tables from single webpage, but unable to save it to .csv file. only last table is getting saved below is the code, please suggest

import time
from selenium import webdriver
import pandas as pd

base_url = 'https://uk.insight.com/en_GB/shop/product/2W1F2EA#ABU/HEWLETT-PACKARD-(HP-INC)/2W1F2EA#ABU/HP-ProBook-440-G8--14"--Core-i7-1165G7--16-GB-RAM--1-TB-SSD--UK/'
print('Opening Chrome Browser Automatically in 5 secs')
time.sleep(5)
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(options=options)
driver.get(base_url)
df = pd.read_html(driver.page_source)
df2 = df[4:]
for table in df2:
    df = pd.DataFrame(table)
    df.to_csv('table.csv',index=False)

i am not aware how to save all dataframe into single .csv as per above only last df is getting saved.

CodePudding user response:

In the Pandas .to_csv() documentation you can use the mode parameter to append data instead of overwrite. It is set to 'w' as default.

If you want to append data you can switch the mode to "a"

df.to_csv('table.csv', mode='a', index=False)

One thing to note is that is the column names will also be appended unless you set header = False

Here is a quick reproduceable example.

import uuid
import pandas as pd

dataframe = pd.DataFrame({
    "person_id": [str(uuid.uuid4())[:7] for _ in range(6)],
    "hours_worked": [38.5, 41.25, "35.0", 27.75, 22.25, -20.5],
    "wage_per_hour": [15.1, 15, 21.30, 17.5, 19.50, 25.50],
})


dataframe2 = pd.DataFrame({
    "person_id2": [str(uuid.uuid4())[:7] for _ in range(6)],
    "hours_worked2": [38.5, 41.25, "35.0", 27.75, 22.25, -20.5],
    "wage_per_hour2": [15.1, 15, 21.30, 17.5, 19.50, 25.50],
})

dataframe.to_csv('TEST.csv', mode='w', index=False)
dataframe2.to_csv('TEST.csv', mode='a', index = False, header=False)

print(pd.read_csv('TEST.csv'))

OUTPUT

   person_id  hours_worked  wage_per_hour
0    1aa66bc         38.50           15.1
1    b7abe05         41.25           15.0
2    15e1779         35.00           21.3
3    3c117d7         27.75           17.5
4    2e6494e         22.25           19.5
5    2a25e45        -20.50           25.5
6    b17d084         38.50           15.1
7    6ca361e         41.25           15.0
8    2cd18e4         35.00           21.3
9    9d120ff         27.75           17.5
10   a0b20d9         22.25           19.5
11   bf9a98d        -20.50           25.5
  • Related