Can someone help me to finish this code with saving the readable dataframe as csv file? I could not-CodePudding

I am trying to get the data in the dataframe as csv file, but I have always an error. I need a last code to convert the readable data content in python into saved csv file.

Code is here:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd`


browser=webdriver.Chrome()

browser.get("https://archive.doingbusiness.org/en/scores")

countries= WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='dftFrontiner']/div[3]/table"))).get_attribute("outerHTML")

df = pd.read_html(countries)

**df.to_csv('output'.csv, index=False)**

print(df)

time.sleep(2)

browser.quit()

Without the line written in bold, I can get the following output:

[ 0 1 2 3 0 Region Region Region Region 1 NaN East Asia & Pacific 62.7 63.3 2 NaN Europe & Central Asia 71.8 73.1 3 NaN Latin America & Caribbean 58.8 59.1 4 NaN Middle East & North Africa 58.4 60.2 .. ... ... ... ... 217 NaN Vietnam 68.6 69.8 218 NaN West Bank and Gaza 59.7 60 219 NaN Yemen, Rep. 30.7 31.8 220 NaN Zambia 65.7 66.9 221 NaN Zimbabwe 50.5 54.5

When I add the bold line ( df.to_csv('output'.csv, index=False)), I could not save the file. However, I need this data in csv format. Please direct me how to write the code.

Thanks.

CodePudding user response：

That's because pandas.read_html returns a list of DataFrames. So, you need to slice before saving the .csv

Replace:

df.to_csv('output.csv', index=False)

By this :

df[0].to_csv('output.csv', index=False)

# Output :

print(df[0])

          0                           1       2       3
0    Region                      Region  Region  Region
1       NaN         East Asia & Pacific    62.7    63.3
2       NaN       Europe & Central Asia    71.8    73.1
3       NaN   Latin America & Caribbean    58.8    59.1
4       NaN  Middle East & North Africa    58.4    60.2
..      ...                         ...     ...     ...
217     NaN                     Vietnam    68.6    69.8
218     NaN          West Bank and Gaza    59.7      60
219     NaN                 Yemen, Rep.    30.7    31.8
220     NaN                      Zambia    65.7    66.9
221     NaN                    Zimbabwe    50.5    54.5

[222 rows x 4 columns]

If you need a separated .csv for each group (Region & Economy), use this :

for g in df[0].ffill().groupby(0, sort=False):
    sub_df= g[1].reset_index(drop=True).iloc[1:, 1:]
    sub_df.to_csv(f"{g[0]}.csv", index=False)