Home > Enterprise >  How I use python get opendata(xml) data
How I use python get opendata(xml) data

Time:08-28

I am new to python, I am currently working on a project and would like to ask you I want to use python to read the XML data of this URL

URL: https://thbapp.thb.gov.tw/opendata/vd/one/VDLiveList.xml

Field description: https://thbapp.thb.gov.tw/opendata/vd1.aspx

  1. Display the name of each VDID
  2. Read the XML every minute and sum the traffic flow 15 times (I need to analyze the data every 15 minutes)
  3. When the total data in 15 minutes exceeds 225, a reminder (line bot) will pop up.
  4. Find a visual kit to visualize the data for easy identification

Below is my code, please provide suggestions, thank you!

import xml.etree.ElementTree as ET
import requests

url = "https://thbapp.thb.gov.tw/opendata/vd/one/VDLiveList.xml"

response = requests.get(url)
tree = ET.fromstring(response.text)

for vdid in tree.findall('VDLive'):
    x = vdid.find('VDLive').text
    print(x) 

CodePudding user response:

  • Since the tags in the document are namespaced (xmlns), you'll need to query by namespace too.
  • Additionally, you'll need */ or *// for your findall calls to recurse into the tree instead of looking at the root level only.
  • In the example below, I've downloaded the document locally, but you can just as well get it from the API.
  • Naturally, instead of printing things out, you'd want to e.g. save the tuples into, say, an SQL database where you can do your calculations in.
import xml.etree.ElementTree as ET
with open("VDLiveList.xml") as f:
    tree = ET.parse(f).getroot()

for vdlive in tree.findall('*/{http://traffic.transportdata.tw/standard/traffic/schema/}VDLive'):
    vdid = vdlive.find('{http://traffic.transportdata.tw/standard/traffic/schema/}VDID').text
    data_collect_time = vdlive.find('{http://traffic.transportdata.tw/standard/traffic/schema/}DataCollectTime').text
    for lane in vdlive.findall('*//{http://traffic.transportdata.tw/standard/traffic/schema/}Lane'):
        lane_id = lane.find('{http://traffic.transportdata.tw/standard/traffic/schema/}LaneID').text
        for vehicle in lane.findall('*//{http://traffic.transportdata.tw/standard/traffic/schema/}Vehicle'):
            veh_vol = int(vehicle.find('{http://traffic.transportdata.tw/standard/traffic/schema/}Volume').text)
            if veh_vol <= 0:  # Invalid or uninteresting value
                continue
            veh_type = vehicle.find('{http://traffic.transportdata.tw/standard/traffic/schema/}VehicleType').text
            print((vdid, lane_id, data_collect_time, veh_type, veh_vol))

This prints out (e.g.)

('VD-11-0020-000-01', '1', '2022-08-26T21:32:00 08:00', 'S', 14)
('VD-11-0020-000-01', '2', '2022-08-26T21:32:00 08:00', 'S', 7)
('VD-11-0020-000-01', '0', '2022-08-26T21:32:00 08:00', 'S', 1)
('VD-11-0020-000-01', '1', '2022-08-26T21:32:00 08:00', 'S', 3)
('VD-11-0020-000-01', '2', '2022-08-26T21:32:00 08:00', 'S', 5)
('VD-11-0020-008-01', '1', '2022-08-26T21:32:00 08:00', 'S', 4)
  • Related