I have 4 KML Files with multiple polygons. I would like to parse the KML files, extract the data and then store it into my Database. After researching, I figured that the best way to parse a KML file is to install pyKML.
One of my KML files looks like:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>RecAreaPolygons.TAB</name>
<Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
<SimpleField type="string" name="RecAreaName"><displayName><b>RecAreaName</b></displayName>
</SimpleField>
<SimpleField type="string" name="RecAreaCategory"><displayName><b>RecAreaCategory</b></displayName>
</SimpleField>
<SimpleField type="string" name="Province"><displayName><b>Province</b></displayName>
</SimpleField>
<SimpleField type="string" name="Comments"><displayName><b>Comments</b></displayName>
</SimpleField>
</Schema>
<Style id="style1">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<color>ff00ff00</color>
</PolyStyle>
</Style>
<Style id="falseColor">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<Folder id="layer 0">
<name>RecAreaPolygons</name>
<Placemark>
<name>Whistler</name>
<styleUrl>#falseColor</styleUrl>
<Style id="inline">
<IconStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</IconStyle>
<LineStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</PolyStyle>
</Style>
<ExtendedData>
<SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
<SimpleData name="RecAreaName">Whistler</SimpleData>
<SimpleData name="RecAreaCategory">World Class</SimpleData>
<SimpleData name="Province">BC</SimpleData>
<SimpleData name="Comments"></SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
//MULTIPLE OTHER PLACEMARKS
My attempt, as I mentioned was to install pyKML and after installing it, I ran the following code to store it into a dataframe:
with open('RecAreaPolygons.kml', 'rb') as f:
s = f.read()
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)
I'm able to print the first Placemark's Coordinates, but how do I receive the rest and iteratively add it to a dataframe?
Preferably, I'd want my output to look like:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
1 The rest of the entries
2
CodePudding user response:
You can iterate over the placemarks, adding the name and geometry to a list. Then create a dataframe from the list.
If KML has multiple folders then you will need to iterate over the folders then placemarks in the folder.
from pykml import parser
import pandas as pd
with open('p.kml', 'r', encoding="utf-8") as f:
root = parser.parse(f).getroot()
places = []
for place in root.Document.Folder.Placemark:
coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip().split(' ')
data = {item.get("name"): item.text for item in
place.ExtendedData.SchemaData.SimpleData}
places.append({"RecAreaName ": data.get('RecAreaName'),
"RecAreaCategory": data.get('RecAreaCategory'),
"Province": data.get('Province'),
"Comments": data.get('Comments'),
"Coordinates": coords})
df = pd.DataFrame(places)
print(df)
Output:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC None [-123.052382,50.094969,0, -123.050613,50.07531...