I have a Digital Road Map dataset containing coordinates of nodes interconnected through road network and Node number. Dataset in xlsx Dataset has three columns : Col1-**source **, Colm 2 Target and Column 3- geometry. geometry is a linestring of road having start point coordinate, an end point coordinate and few intermediate point coordinates. Source and Target columns are Node number of starting node and end node of each road network. I want to extract only coordinate of starting node and end node from each row. Then arrange the filtered dataset such that each source and each target has the respective coordinates beside it. The sample output file is desired sample output
I am looking for code in shapely, most of the info is on one linestring. Since my data has more than a million rows so I am not able to find a relevant code that iterates through entire dataset.
CodePudding user response:
Do you mean:
df[['Start', 'end']] = df['geometry'].str.split(',', expand=True)
CodePudding user response:
- your sample data is unusable as it is an image. Have simulated some
- pick first and last point from LINESTRING
- structure as columns (in
df
) - reshape
df
asdf2
as your desired structure
import io
import shapely.geometry, shapely.wkt
import pandas as pd
import numpy as np
# sample data...
df = pd.read_csv(
io.StringIO(
'''source,target,geometry
0,100,"LINESTRING (5.897759230176348 49.44266714130711, 6.242751092156993 49.90222565367873, 5.674051954784829 49.5294835475575)"
1,101,"LINESTRING (13.59594567226444 48.87717194273715, 12.51844038254671 54.470370591848, 6.658229607783568 49.20195831969157)"
2,102,"LINESTRING (16.71947594571444 50.21574656839354, 23.42650841644439 50.30850576435745, 22.77641889821263 49.02739533140962, 14.60709842291953 51.74518809671997)"
3,103,"LINESTRING (18.62085859546164 54.68260569927078, 23.79919884613338 52.69109935160657, 20.89224450041863 54.31252492941253)"
4,104,"LINESTRING (5.606975945670001 51.03729848896978, 6.589396599970826 51.85202912048339, 3.31501148496416 51.34577662473805, 5.988658074577813 51.85161570902505)"
5,105,"LINESTRING (4.799221632515724 49.98537303323637, 6.043073357781111 50.12805166279423, 3.31501148496416 51.34577662473805, 6.15665815595878 50.80372101501058, 3.314971144228537 51.34575511331991)"
6,106,"LINESTRING (3.31501148496416 51.34577662473805, 3.830288527043137 51.62054454203195, 6.905139601274129 53.48216217713065, 4.705997348661185 53.09179840759776)"
7,107,"LINESTRING (7.092053256873896 53.14404328064489, 3.830288527043137 51.62054454203195, 6.842869500362383 52.22844025329755, 3.31501148496416 51.34577662473805)"
8,108,"LINESTRING (6.589396599970826 51.85202912048339, 6.905139601274129 53.48216217713065, 3.314971144228537 51.34575511331991, 5.988658074577813 51.85161570902505)"
9,109,"LINESTRING (5.606975945670001 51.03729848896978, 4.286022983425084 49.90749664977255)"'''
)
)
# pick first and last point from each linestring as columns
df = df.join(
df["geometry"]
.apply(lambda ls: np.array(shapely.wkt.loads(ls).coords)[[0, -1]])
.apply(
lambda x: {
f"{c}_point": shapely.geometry.Point(x[i])
for i, c in enumerate(df.columns)
if c != "geometry"
}
)
.apply(pd.Series)
)
# reshape to row wise
df2 = pd.melt(
df,
id_vars=["source", "target"],
value_vars=["source_point", "target_point"],
value_name="point",
)
df2["node_number"] = np.where(
df2["variable"] == "source_point", df2["source"], df2["target"]
)
df2 = df2.drop(columns=["source", "target", "variable"])
output
point | node_number |
---|---|
POINT (5.897759230176348 49.44266714130711) | 0 |
POINT (13.59594567226444 48.87717194273715) | 1 |
POINT (16.71947594571444 50.21574656839354) | 2 |
POINT (18.62085859546164 54.68260569927078) | 3 |
POINT (5.606975945670001 51.03729848896978) | 4 |
POINT (4.799221632515724 49.98537303323637) | 5 |
POINT (3.31501148496416 51.34577662473805) | 6 |
POINT (7.092053256873896 53.14404328064489) | 7 |
POINT (6.589396599970826 51.85202912048339) | 8 |
POINT (5.606975945670001 51.03729848896978) | 9 |
POINT (5.674051954784829 49.5294835475575) | 100 |
POINT (6.658229607783568 49.20195831969157) | 101 |
POINT (14.60709842291953 51.74518809671997) | 102 |
POINT (20.89224450041863 54.31252492941253) | 103 |
POINT (5.988658074577813 51.85161570902505) | 104 |
POINT (3.314971144228537 51.34575511331991) | 105 |
POINT (4.705997348661185 53.09179840759776) | 106 |
POINT (3.31501148496416 51.34577662473805) | 107 |
POINT (5.988658074577813 51.85161570902505) | 108 |
POINT (4.286022983425084 49.90749664977255) | 109 |