I am currently working on a project and want with API or webscrapping get the table from a website.
I gave the following code:
import requests
import pandas as pd
import numpy as np
url = 'https://worldpopulationreview.com/state-rankings/circumcision-rates-by-state'
resp = requests.get(url)
tables = pd.read_html(resp.text)
all_df = pd.concat(tables)
data= pd.DataFrame(all_df)
But i got the error message no tables found, but I want the table which also can download csv.
Anyone know what the problem is?
CodePudding user response:
Here is one way of getting that data as a dataframe:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import json
url = 'https://worldpopulationreview.com/state-rankings/circumcision-rates-by-state'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}
soup = bs(requests.get(url, headers=headers).text, 'html.parser')
script_w_data = soup.select_one('script[id="__NEXT_DATA__"]').text
df = pd.json_normalize(json.loads(script_w_data)['props']['pageProps']['listing'])
print(df)
Result in terminal:
fips state densityMi pop2023 pop2022 pop2020 pop2019 pop2010 growthRate growth growthSince2010 circumcisionRate
0 54 West Virginia 73.88019 1775932 1781860 1793716 1799642 1852994 -0.00333 -5928 -0.04159 0.87
1 26 Michigan 179.26454 10135438 10116069 10077331 10057961 9883640 0.00191 19369 0.02548 0.86
2 21 Kentucky 115.37702 4555777 4539130 4505836 4489190 4339367 0.00367 16647 0.04987 0.85
3 31 Nebraska 26.06024 2002052 1988536 1961504 1947985 1826341 0.00680 13516 0.09621 0.84
4 39 Ohio 290.70091 11878330 11852036 11799448 11773150 11536504 0.00222 26294 0.02963 0.84
5 18 Indiana 191.92896 6876047 6845874 6785528 6755359 6483802 0.00441 30173 0.06050 0.83
6 19 Iowa 57.89018 3233572 3219171 3190369 3175964 3046355 0.00447 14401 0.06146 0.82
7 55 Wisconsin 109.96966 5955737 5935064 5893718 5873043 5686986 0.00348 20673 0.04726 0.82
8 45 South Carolina 175.18855 5266343 5217037 5118425 5069118 4625364 0.00945 49306 0.13858 0.81
9 42 Pennsylvania 292.62222 13092796 13062764 13002700 12972667 12702379 0.00230 30032 0.03074 0.79
10 56 Wyoming 5.98207 580817 579495 576851 575524 563626 0.00228 1322 0.03050 0.79
11 15 Hawaii 231.00763 1483762 1474265 1455271 1445774 1360301 0.00644 9497 0.09076 0.78
12 20 Kansas 36.24443 2963308 2954832 2937880 2929402 2853118 0.00287 8476 0.03862 0.77
13 38 North Dakota 11.75409 811044 800394 779094 768441 672591 0.01331 10650 0.20585 0.77
14 40 Oklahoma 58.63041 4021753 4000953 3959353 3938551 3751351 0.00520 20800 0.07208 0.77
15 46 South Dakota 11.98261 908414 901165 886667 879421 814180 0.00804 7249 0.11574 0.77
16 29 Missouri 90.26083 6204710 6188111 6154913 6138318 5988927 0.00268 16599 0.03603 0.76
17 33 New Hampshire 155.90830 1395847 1389741 1377529 1371424 1316470 0.00439 6106 0.06030 0.76
18 44 Rhode Island 1074.29594 1110822 1106341 1097379 1092896 1052567 0.00405 4481 0.05535 0.76
19 47 Tennessee 171.70515 7080262 7023788 6910840 6854371 6346105 0.00804 56474 0.11569 0.76
20 51 Virginia 223.36045 8820504 8757467 8631393 8568357 8001024 0.00720 63037 0.10242 0.74
21 13 Georgia 191.59470 11019186 10916760 10711908 10609487 9687653 0.00938 102426 0.13745 0.72
22 24 Maryland 648.84362 6298325 6257958 6177224 6136855 5773552 0.00645 40367 0.09089 0.72
23 9 Connecticut 746.69537 3615499 3612314 3605944 3602762 3574097 0.00088 3185 0.01158 0.71
24 23 Maine 44.50148 1372559 1369159 1362359 1358961 1328361 0.00248 3400 0.03327 0.67
25 5 Arkansas 58.42619 3040207 3030646 3011524 3001967 2915918 0.00315 9561 0.04262 0.66
26 8 Colorado 57.86332 5997070 5922618 5773714 5699264 5029196 0.01257 74452 0.19245 0.66
27 25 Massachusetts 919.82103 7174604 7126375 7029917 6981690 6547629 0.00677 48229 0.09576 0.66
28 34 New Jersey 1283.40005 9438124 9388414 9288994 9239284 8791894 0.00529 49710 0.07350 0.66
29 50 Vermont 70.33514 648279 646545 643077 641347 625741 0.00268 1734 0.03602 0.64
30 17 Illinois 230.67908 12807072 12808884 12812508 12814324 12830632 -0.00014 -1812 -0.00184 0.63
31 27 Minnesota 73.18202 5827265 5787008 5706494 5666238 5303925 0.00696 40257 0.09867 0.63
32 36 New York 433.90472 20448194 20365879 20201249 20118937 19378102 0.00404 82315 0.05522 0.59
33 37 North Carolina 220.30026 10710558 10620168 10439388 10348993 9535483 0.00851 90390 0.12323 0.52
34 30 Montana 7.64479 1112668 1103187 1084225 1074744 989415 0.00859 9481 0.12457 0.50
35 48 Texas 116.16298 30345487 29945493 29145505 28745507 25145561 0.01336 399994 0.20679 0.50
36 35 New Mexico 17.60148 2135024 2129190 2117522 2111685 2059179 0.00274 5834 0.03683 0.49
37 22 Louisiana 108.67214 4695071 4682633 4657757 4645314 4533372 0.00266 12438 0.03567 0.45
38 49 Utah 41.66892 3423935 3373162 3271616 3220842 2763885 0.01505 50773 0.23881 0.42
39 12 Florida 416.95573 22359251 22085563 21538187 21264502 18801310 0.01239 273688 0.18924 0.35
40 41 Oregon 45.41307 4359110 4318492 4237256 4196636 3831074 0.00941 40618 0.13783 0.24
41 6 California 258.20877 40223504 39995077 39538223 39309799 37253956 0.00571 228427 0.07971 0.22
42 1 Alabama 100.65438 5097641 5073187 5024279 4999822 4779736 0.00482 24454 0.06651 0.20
43 53 Washington 120.37292 7999503 7901429 7705281 7607206 6724540 0.01241 98074 0.18960 0.15
44 32 Nevada 29.38425 3225832 3185426 3104614 3064205 2700551 0.01268 40406 0.19451 0.12
45 2 Alaska 1.29738 740339 738023 733391 731075 710231 0.00314 2316 0.04239 NaN
46 4 Arizona 64.96246 7379346 7303398 7151502 7075549 6392017 0.01040 75948 0.15446 NaN
47 10 Delaware 522.08876 1017551 1008350 989948 980743 897934 0.00912 9201 0.13321 NaN
48 16 Idaho 23.23926 1920562 1893410 1839106 1811950 1567582 0.01434 27152 0.22517 NaN
49 28 Mississippi 63.07084 2959473 2960075 2961279 2961879 2967297 -0.00020 -602 -0.00264 NaN
CodePudding user response:
With some help from selenium before calling read_html
:
#https://selenium-python.readthedocs.io/installation.html
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
import pandas as pd
s = Service("./chromedriver.exe")
url = 'https://worldpopulationreview.com/state-rankings/circumcision-rates-by-state'
with webdriver.Chrome(service=s) as driver:
driver.get(url)
df = pd.concat(pd.read_html(driver.page_source))
Output :
print(df)
State Circumcision Rate
0 West Virginia 87%
1 Michigan 86%
2 Kentucky 85%
3 Nebraska 84%
4 Ohio 84%
.. ... ...
45 Alaska 0%
46 Arizona 0%
47 Delaware 0%
48 Idaho 0%
49 Mississippi 0%
[50 rows x 2 columns]