Home > Mobile >  Parsing a KML File and storing in a database with Python
Parsing a KML File and storing in a database with Python

Time:02-15

I have 4 KML Files with multiple polygons. I would like to parse the KML files, extract the data and then store it into my Database. After researching, I figured that the best way to parse a KML file is to install pyKML.

One of my KML files looks like:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
    <name>RecAreaPolygons.TAB</name>
    <Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
        <SimpleField type="string" name="RecAreaName"><displayName>&lt;b&gt;RecAreaName&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="RecAreaCategory"><displayName>&lt;b&gt;RecAreaCategory&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="Province"><displayName>&lt;b&gt;Province&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="Comments"><displayName>&lt;b&gt;Comments&lt;/b&gt;</displayName>
</SimpleField>
    </Schema>
    <Style id="style1">
        <BalloonStyle>
            <text><![CDATA[<table border="0">
  <tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
  <tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
  <tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
  <tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
        </BalloonStyle>
        <PolyStyle>
            <color>ff00ff00</color>
        </PolyStyle>
    </Style>
    <Style id="falseColor">
        <BalloonStyle>
            <text><![CDATA[<table border="0">
  <tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
  <tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
  <tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
  <tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
        </BalloonStyle>
        <PolyStyle>
            <colorMode>random</colorMode>
        </PolyStyle>
    </Style>
    <Folder id="layer 0">
        <name>RecAreaPolygons</name>
        <Placemark>
            <name>Whistler</name>
            <styleUrl>#falseColor</styleUrl>
            <Style id="inline">
                <IconStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </IconStyle>
                <LineStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </LineStyle>
                <PolyStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </PolyStyle>
            </Style>
            <ExtendedData>
                <SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
                    <SimpleData name="RecAreaName">Whistler</SimpleData>
                    <SimpleData name="RecAreaCategory">World Class</SimpleData>
                    <SimpleData name="Province">BC</SimpleData>
                    <SimpleData name="Comments"></SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>
                            -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0 
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
//MULTIPLE OTHER PLACEMARKS


My attempt, as I mentioned was to install pyKML and after installing it, I ran the following code to store it into a dataframe:

with open('RecAreaPolygons.kml', 'rb') as f:
   s = f.read()
   
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)

I'm able to print the first Placemark's Coordinates, but how do I receive the rest and iteratively add it to a dataframe?


Preferably, I'd want my output to look like:

          RecAreaName  RecAreaCategory  Province  Comments  Coordinates  
0            Whistler      World Class        BC            -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0 
1                       The rest of the entries
2            

CodePudding user response:

You can iterate over the placemarks, adding the name and geometry to a list. Then create a dataframe from the list.

If KML has multiple folders then you will need to iterate over the folders then placemarks in the folder.

from pykml import parser
import pandas as pd

with open('p.kml', 'r', encoding="utf-8") as f:
   root = parser.parse(f).getroot()
   
places = []
for place in root.Document.Folder.Placemark:
    coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip().split(' ')
    data = {item.get("name"): item.text for item in
            place.ExtendedData.SchemaData.SimpleData}
    places.append({"RecAreaName  ": data.get('RecAreaName'),
                  "RecAreaCategory": data.get('RecAreaCategory'),
                  "Province": data.get('Province'),
                  "Comments": data.get('Comments'),
                  "Coordinates": coords})
df = pd.DataFrame(places)
print(df)

Output:

  RecAreaName   RecAreaCategory Province Comments  Coordinates
0      Whistler     World Class       BC     None  [-123.052382,50.094969,0, -123.050613,50.07531...
  • Related