I am new to python, I am currently working on a project and would like to ask you I want to use python to read the XML data of this URL
URL: https://thbapp.thb.gov.tw/opendata/vd/one/VDLiveList.xml
Field description: https://thbapp.thb.gov.tw/opendata/vd1.aspx
- Display the name of each VDID
- Read the XML every minute and sum the traffic flow 15 times (I need to analyze the data every 15 minutes)
- When the total data in 15 minutes exceeds 225, a reminder (line bot) will pop up.
- Find a visual kit to visualize the data for easy identification
Below is my code, please provide suggestions, thank you!
import xml.etree.ElementTree as ET
import requests
url = "https://thbapp.thb.gov.tw/opendata/vd/one/VDLiveList.xml"
response = requests.get(url)
tree = ET.fromstring(response.text)
for vdid in tree.findall('VDLive'):
x = vdid.find('VDLive').text
print(x)
CodePudding user response:
- Since the tags in the document are namespaced (
xmlns
), you'll need to query by namespace too. - Additionally, you'll need
*/
or*//
for yourfindall
calls to recurse into the tree instead of looking at the root level only. - In the example below, I've downloaded the document locally, but you can just as well get it from the API.
- Naturally, instead of printing things out, you'd want to e.g. save the tuples into, say, an SQL database where you can do your calculations in.
import xml.etree.ElementTree as ET
with open("VDLiveList.xml") as f:
tree = ET.parse(f).getroot()
for vdlive in tree.findall('*/{http://traffic.transportdata.tw/standard/traffic/schema/}VDLive'):
vdid = vdlive.find('{http://traffic.transportdata.tw/standard/traffic/schema/}VDID').text
data_collect_time = vdlive.find('{http://traffic.transportdata.tw/standard/traffic/schema/}DataCollectTime').text
for lane in vdlive.findall('*//{http://traffic.transportdata.tw/standard/traffic/schema/}Lane'):
lane_id = lane.find('{http://traffic.transportdata.tw/standard/traffic/schema/}LaneID').text
for vehicle in lane.findall('*//{http://traffic.transportdata.tw/standard/traffic/schema/}Vehicle'):
veh_vol = int(vehicle.find('{http://traffic.transportdata.tw/standard/traffic/schema/}Volume').text)
if veh_vol <= 0: # Invalid or uninteresting value
continue
veh_type = vehicle.find('{http://traffic.transportdata.tw/standard/traffic/schema/}VehicleType').text
print((vdid, lane_id, data_collect_time, veh_type, veh_vol))
This prints out (e.g.)
('VD-11-0020-000-01', '1', '2022-08-26T21:32:00 08:00', 'S', 14)
('VD-11-0020-000-01', '2', '2022-08-26T21:32:00 08:00', 'S', 7)
('VD-11-0020-000-01', '0', '2022-08-26T21:32:00 08:00', 'S', 1)
('VD-11-0020-000-01', '1', '2022-08-26T21:32:00 08:00', 'S', 3)
('VD-11-0020-000-01', '2', '2022-08-26T21:32:00 08:00', 'S', 5)
('VD-11-0020-008-01', '1', '2022-08-26T21:32:00 08:00', 'S', 4)