I am trying to get the data in the dataframe as csv file, but I have always an error. I need a last code to convert the readable data content in python into saved csv file.
Code is here:
from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd`
browser=webdriver.Chrome()
browser.get("https://archive.doingbusiness.org/en/scores")
countries= WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='dftFrontiner']/div[3]/table"))).get_attribute("outerHTML")
df = pd.read_html(countries)
**df.to_csv('output'.csv, index=False)**
print(df)
time.sleep(2)
browser.quit()
Without the line written in bold, I can get the following output:
[ 0 1 2 3 0 Region Region Region Region 1 NaN East Asia & Pacific 62.7 63.3 2 NaN Europe & Central Asia 71.8 73.1 3 NaN Latin America & Caribbean 58.8 59.1 4 NaN Middle East & North Africa 58.4 60.2 .. ... ... ... ... 217 NaN Vietnam 68.6 69.8 218 NaN West Bank and Gaza 59.7 60 219 NaN Yemen, Rep. 30.7 31.8 220 NaN Zambia 65.7 66.9 221 NaN Zimbabwe 50.5 54.5
When I add the bold line ( df.to_csv('output'.csv, index=False)), I could not save the file. However, I need this data in csv format. Please direct me how to write the code.
Thanks.
CodePudding user response:
That's because pandas.read_html
returns a list of DataFrames. So, you need to slice before saving the .csv
Replace:
df.to_csv('output.csv', index=False)
By this :
df[0].to_csv('output.csv', index=False)
# Output :
print(df[0])
0 1 2 3
0 Region Region Region Region
1 NaN East Asia & Pacific 62.7 63.3
2 NaN Europe & Central Asia 71.8 73.1
3 NaN Latin America & Caribbean 58.8 59.1
4 NaN Middle East & North Africa 58.4 60.2
.. ... ... ... ...
217 NaN Vietnam 68.6 69.8
218 NaN West Bank and Gaza 59.7 60
219 NaN Yemen, Rep. 30.7 31.8
220 NaN Zambia 65.7 66.9
221 NaN Zimbabwe 50.5 54.5
[222 rows x 4 columns]
If you need a separated .csv
for each group (Region & Economy), use this :
for g in df[0].ffill().groupby(0, sort=False):
sub_df= g[1].reset_index(drop=True).iloc[1:, 1:]
sub_df.to_csv(f"{g[0]}.csv", index=False)