Home > Software engineering >  webscrapping with api in python from url
webscrapping with api in python from url

Time:02-05

I am currently working on a project and want with API or webscrapping get the table from a website.

I gave the following code:

import requests 
import pandas as pd
import numpy as np
url = 'https://worldpopulationreview.com/state-rankings/circumcision-rates-by-state' 
resp = requests.get(url)
tables = pd.read_html(resp.text)
all_df = pd.concat(tables)
data= pd.DataFrame(all_df)

But i got the error message no tables found, but I want the table which also can download csv.

Anyone know what the problem is?

CodePudding user response:

Here is one way of getting that data as a dataframe:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import json

url = 'https://worldpopulationreview.com/state-rankings/circumcision-rates-by-state'

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}

soup = bs(requests.get(url, headers=headers).text, 'html.parser')
script_w_data = soup.select_one('script[id="__NEXT_DATA__"]').text
df = pd.json_normalize(json.loads(script_w_data)['props']['pageProps']['listing'])
print(df)

Result in terminal:

fips    state   densityMi   pop2023 pop2022 pop2020 pop2019 pop2010 growthRate  growth  growthSince2010 circumcisionRate
0   54  West Virginia   73.88019    1775932 1781860 1793716 1799642 1852994 -0.00333    -5928   -0.04159    0.87
1   26  Michigan    179.26454   10135438    10116069    10077331    10057961    9883640 0.00191 19369   0.02548 0.86
2   21  Kentucky    115.37702   4555777 4539130 4505836 4489190 4339367 0.00367 16647   0.04987 0.85
3   31  Nebraska    26.06024    2002052 1988536 1961504 1947985 1826341 0.00680 13516   0.09621 0.84
4   39  Ohio    290.70091   11878330    11852036    11799448    11773150    11536504    0.00222 26294   0.02963 0.84
5   18  Indiana 191.92896   6876047 6845874 6785528 6755359 6483802 0.00441 30173   0.06050 0.83
6   19  Iowa    57.89018    3233572 3219171 3190369 3175964 3046355 0.00447 14401   0.06146 0.82
7   55  Wisconsin   109.96966   5955737 5935064 5893718 5873043 5686986 0.00348 20673   0.04726 0.82
8   45  South Carolina  175.18855   5266343 5217037 5118425 5069118 4625364 0.00945 49306   0.13858 0.81
9   42  Pennsylvania    292.62222   13092796    13062764    13002700    12972667    12702379    0.00230 30032   0.03074 0.79
10  56  Wyoming 5.98207 580817  579495  576851  575524  563626  0.00228 1322    0.03050 0.79
11  15  Hawaii  231.00763   1483762 1474265 1455271 1445774 1360301 0.00644 9497    0.09076 0.78
12  20  Kansas  36.24443    2963308 2954832 2937880 2929402 2853118 0.00287 8476    0.03862 0.77
13  38  North Dakota    11.75409    811044  800394  779094  768441  672591  0.01331 10650   0.20585 0.77
14  40  Oklahoma    58.63041    4021753 4000953 3959353 3938551 3751351 0.00520 20800   0.07208 0.77
15  46  South Dakota    11.98261    908414  901165  886667  879421  814180  0.00804 7249    0.11574 0.77
16  29  Missouri    90.26083    6204710 6188111 6154913 6138318 5988927 0.00268 16599   0.03603 0.76
17  33  New Hampshire   155.90830   1395847 1389741 1377529 1371424 1316470 0.00439 6106    0.06030 0.76
18  44  Rhode Island    1074.29594  1110822 1106341 1097379 1092896 1052567 0.00405 4481    0.05535 0.76
19  47  Tennessee   171.70515   7080262 7023788 6910840 6854371 6346105 0.00804 56474   0.11569 0.76
20  51  Virginia    223.36045   8820504 8757467 8631393 8568357 8001024 0.00720 63037   0.10242 0.74
21  13  Georgia 191.59470   11019186    10916760    10711908    10609487    9687653 0.00938 102426  0.13745 0.72
22  24  Maryland    648.84362   6298325 6257958 6177224 6136855 5773552 0.00645 40367   0.09089 0.72
23  9   Connecticut 746.69537   3615499 3612314 3605944 3602762 3574097 0.00088 3185    0.01158 0.71
24  23  Maine   44.50148    1372559 1369159 1362359 1358961 1328361 0.00248 3400    0.03327 0.67
25  5   Arkansas    58.42619    3040207 3030646 3011524 3001967 2915918 0.00315 9561    0.04262 0.66
26  8   Colorado    57.86332    5997070 5922618 5773714 5699264 5029196 0.01257 74452   0.19245 0.66
27  25  Massachusetts   919.82103   7174604 7126375 7029917 6981690 6547629 0.00677 48229   0.09576 0.66
28  34  New Jersey  1283.40005  9438124 9388414 9288994 9239284 8791894 0.00529 49710   0.07350 0.66
29  50  Vermont 70.33514    648279  646545  643077  641347  625741  0.00268 1734    0.03602 0.64
30  17  Illinois    230.67908   12807072    12808884    12812508    12814324    12830632    -0.00014    -1812   -0.00184    0.63
31  27  Minnesota   73.18202    5827265 5787008 5706494 5666238 5303925 0.00696 40257   0.09867 0.63
32  36  New York    433.90472   20448194    20365879    20201249    20118937    19378102    0.00404 82315   0.05522 0.59
33  37  North Carolina  220.30026   10710558    10620168    10439388    10348993    9535483 0.00851 90390   0.12323 0.52
34  30  Montana 7.64479 1112668 1103187 1084225 1074744 989415  0.00859 9481    0.12457 0.50
35  48  Texas   116.16298   30345487    29945493    29145505    28745507    25145561    0.01336 399994  0.20679 0.50
36  35  New Mexico  17.60148    2135024 2129190 2117522 2111685 2059179 0.00274 5834    0.03683 0.49
37  22  Louisiana   108.67214   4695071 4682633 4657757 4645314 4533372 0.00266 12438   0.03567 0.45
38  49  Utah    41.66892    3423935 3373162 3271616 3220842 2763885 0.01505 50773   0.23881 0.42
39  12  Florida 416.95573   22359251    22085563    21538187    21264502    18801310    0.01239 273688  0.18924 0.35
40  41  Oregon  45.41307    4359110 4318492 4237256 4196636 3831074 0.00941 40618   0.13783 0.24
41  6   California  258.20877   40223504    39995077    39538223    39309799    37253956    0.00571 228427  0.07971 0.22
42  1   Alabama 100.65438   5097641 5073187 5024279 4999822 4779736 0.00482 24454   0.06651 0.20
43  53  Washington  120.37292   7999503 7901429 7705281 7607206 6724540 0.01241 98074   0.18960 0.15
44  32  Nevada  29.38425    3225832 3185426 3104614 3064205 2700551 0.01268 40406   0.19451 0.12
45  2   Alaska  1.29738 740339  738023  733391  731075  710231  0.00314 2316    0.04239 NaN
46  4   Arizona 64.96246    7379346 7303398 7151502 7075549 6392017 0.01040 75948   0.15446 NaN
47  10  Delaware    522.08876   1017551 1008350 989948  980743  897934  0.00912 9201    0.13321 NaN
48  16  Idaho   23.23926    1920562 1893410 1839106 1811950 1567582 0.01434 27152   0.22517 NaN
49  28  Mississippi 63.07084    2959473 2960075 2961279 2961879 2967297 -0.00020    -602    -0.00264    NaN

CodePudding user response:

With some help from before calling read_html :

#https://selenium-python.readthedocs.io/installation.html
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
import pandas as pd
​
s = Service("./chromedriver.exe")
​
url = 'https://worldpopulationreview.com/state-rankings/circumcision-rates-by-state' 
​
with webdriver.Chrome(service=s) as driver:
    driver.get(url)
    df = pd.concat(pd.read_html(driver.page_source))
​

Output :

print(df)

         State Circumcision Rate
0   West Virginia               87%
1        Michigan               86%
2        Kentucky               85%
3        Nebraska               84%
4            Ohio               84%
..            ...               ...
45         Alaska                0%
46        Arizona                0%
47       Delaware                0%
48          Idaho                0%
49    Mississippi                0%

[50 rows x 2 columns]
  • Related