Home > database >  How to webscrape data from only specific cells in python?
How to webscrape data from only specific cells in python?

Time:03-08

I am trying to webscrape some data from https://il.water.usgs.gov/gmaps/precip/. I only want specific cells from the row called "RAIN GAGE AT PING TOM PARK AT CHICAGO, IL. Only the cells containing the 1, 3, and 12 hour predictions for rain. What should I fix?

    import pandas as pd

    url = "https://il.water.usgs.gov/gmaps/precip/"
    df = pd.read_html(url, flavor="bs4")[0]
    print(df.loc[df[0] == "RAIN GAGE AT PING TOM PARK AT CHICAGO, IL"])

CodePudding user response:

Data is dynamically retrieved from another endpoint returning JSON. You could write a function calling that endpoint and pass in location and desired hours

def get_precipitation(location:str, hrs:list):
    import requests
    url = "https://il.water.usgs.gov/gmaps/precip/data/rainfall_outIL_WSr2.json"
    r = requests.get('https://il.water.usgs.gov/gmaps/precip/data/rainfall_outIL_WSr2.json').json()
    data = [i for i in r['value']['items'] if i['title'] == location][0]
    
    for k,v in data.items():
        if k in hrs:
            print(f'{k}={v}')


if __name__ == "__main__":
    
    location = "RAIN GAGE AT PING TOM PARK AT CHICAGO, IL"   
    hrs = ['precip1hrvalue', 'precip3hrvalue', 'precip12hrvalue']

    get_precipitation(location, hrs)
  • Related