Home > OS >  Issue with read html in pandas
Issue with read html in pandas

Time:04-24

I want to read a table form Wikipedia:

import pandas as pd
caption="Edit section: 2019 inequality-adjusted HDI (IHDI) (2020 report)"
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_inequality-adjusted_Human_Development_Index',match=caption)
df

But I got this errore: "ValueError: No tables found matching pattern 'Edit section: 2019 inequality-adjusted HDI (IHDI) (2020 report)'"

This method worked for table like below table:

caption = "Average daily maximum and minimum temperatures for selected cities in Minnesota"

df = pd.read_html('https://en.wikipedia.org/wiki/Minnesota', match=caption)
df

But I get confused for this one, how can I solved this problem?

CodePudding user response:

You have multiple problems here.

pandas doesn't support https, and there's no such caption that you're looking for.

Try this:

import pandas as pd
import requests

caption = "Table of countries by IHDI"
df = pd.read_html(
    requests.get("https://en.wikipedia.org/wiki/List_of_countries_by_inequality-adjusted_Human_Development_Index").text,
    match=caption,
)
print(df[0].head())

Output:

  Rank      Country  ... 2019 estimates (2020 report)[4][5][6]                  
  Rank      Country  ...                      Overall loss (%) Growth since 2010
0    1       Norway  ...                                   6.1             0.021
1    2      Iceland  ...                                   5.8             0.055
2    3  Switzerland  ...                                   6.9             0.015
3    4      Finland  ...                                   5.3             0.040
4    5      Ireland  ...                                   7.3             0.066

[5 rows x 6 columns]

CodePudding user response:

import pandas as pd

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_inequality-adjusted_Human_Development_Index')

df[2]

Or if you wish to use match argument

import pandas as pd

caption="Table of countries by IHDI"
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_inequality-adjusted_Human_Development_Index',match=caption)

df[0]

Returns

enter image description here

  • Related