Home > OS >  How to scrape the text from this data?
How to scrape the text from this data?

Time:10-08

 <script type="text/javascript">
/**
 * Define SVG path for target icon
 */
var targetSVG = "M9,0C4.029,0,0,4.029,0,9s4.029,9,9,9s9-4.029,9-9S13.971,0,9,0z M9,15.93 c-3.83,0-6.93-3.1-6.93-6.93S5.17,2.07,9,2.07s6.93,3.1,6.93,6.93S12.83,15.93,9,15.93 M12.5,9c0,1.933-1.567,3.5-3.5,3.5S5.5,10.933,5.5,9S7.067,5.5,9,5.5 S12.5,7.067,12.5,9z";

/**
 * Create the map
 */
var i=1;


var countrydataprovider = {
 "map": "indiaLow",
"getAreasFromMap": true,
  "theme": "none",
 
 "imagesSettings": {
    "rollOverColor": "#089282",
    "rollOverScale": 3,
"labelPosition": "middle",
    "labelFontSize": 8,
 "labelColor": "#fff",
    "selectedScale": 3,
    "selectedColor": "#089282",
    "color": "#13564e"
  },
"images": [
    {
        "imageURL": "nowcast_marker\/map-marker-icon-png-green.png",
        "width": 20,
        "height": 20,
        "description": "<p>No Warning <\/br><\/br> Time of issue: 2022-10-07<\/br>1005 Hrs<\/br> Valid upto: 1305 Hrs <\/p>",
        "zoomLevel": 5,
        "scale": 0.5,
        "title": "Bapatla",
        "latitude": "15.905897",
        "longitude": "80.471587"
    },

I want to get the data regarding the information regarding "images" subsection. This is the code that I have written until now. However, I could not move forward. Could anybody please help?

import requests # This is a request to the website
from bs4 import BeautifulSoup # This is a parser

url = "https://mausam.imd.gov.in/imd_latest/contents/stationwise-nowcast-warning.php"
html = requests.get(url).content # requests instance
soup = BeautifulSoup(html, 'html.parser') # getting raw data
a = soup.find('script', attrs={'type': 'text/javascript'})

CodePudding user response:

You are on the right track, you just need to further dissect the information from that tag, to get what you need. Here is one way of obtaining that data:

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
import json

url = 'https://mausam.imd.gov.in/imd_latest/contents/stationwise-nowcast-warning.php'
script_w_data = bs(requests.get(url).text, 'html.parser').select_one('script[type="text/javascript"]').text.split('"images": [')[1].split(']')[0]
obj = json.loads('['   script_w_data   ']')
df = pd.json_normalize(obj)
print(df)

Result in terminal:

    imageURL    width   height  description zoomLevel   scale   title   latitude    longitude
0   nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Bapatla 15.905897   80.471587
1   nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Eluru   16.71066    81.09524
2   nowcast_marker/map-marker-icon-png-yellow.png   20  20  <p>Light rain: < 5 mm/hr</br> Light Thundersto...   5   0.5 Gannavaram  16.540171   80.801249
3   nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Guntur  16.306652   80.43654
4   nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Kakinada    16.945181   82.238647
... ... ... ... ... ... ... ... ... ...
1115    nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Namrup  27.12   95.18
1116    nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Nazira  26.54   94.44
1117    nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Moreh   24.2475 94.3045
1118    nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Moirang 24.5028 93.7768
1119    nowcast_marker/map-marker-icon-png-green.png    20  20  <p>No Warning </br></br> Time of issue: 2022-1...   5   0.5 Jhandutta   31.3702 76.6369
1120 rows × 9 columns

See pandas documentation at https://pandas.pydata.org/docs/

Also BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/

  • Related