Linestring end coordinates in a .csv file along with source and target id-CodePudding

I have a Digital Road Map dataset containing coordinates of nodes interconnected through road network and Node number. Dataset in xlsx Dataset has three columns : Col1-**source **, Colm 2 Target and Column 3- geometry. geometry is a linestring of road having start point coordinate, an end point coordinate and few intermediate point coordinates. Source and Target columns are Node number of starting node and end node of each road network. I want to extract only coordinate of starting node and end node from each row. Then arrange the filtered dataset such that each source and each target has the respective coordinates beside it. The sample output file is desired sample output

I am looking for code in shapely, most of the info is on one linestring. Since my data has more than a million rows so I am not able to find a relevant code that iterates through entire dataset.

CodePudding user response：

Do you mean:

df[['Start', 'end']] = df['geometry'].str.split(',', expand=True)

CodePudding user response：

your sample data is unusable as it is an image. Have simulated some
pick first and last point from LINESTRING
structure as columns (in df)
reshape df as df2 as your desired structure

import io
import shapely.geometry, shapely.wkt
import pandas as pd
import numpy as np

# sample data...
df = pd.read_csv(
    io.StringIO(
        '''source,target,geometry
0,100,"LINESTRING (5.897759230176348 49.44266714130711, 6.242751092156993 49.90222565367873, 5.674051954784829 49.5294835475575)"
1,101,"LINESTRING (13.59594567226444 48.87717194273715, 12.51844038254671 54.470370591848, 6.658229607783568 49.20195831969157)"
2,102,"LINESTRING (16.71947594571444 50.21574656839354, 23.42650841644439 50.30850576435745, 22.77641889821263 49.02739533140962, 14.60709842291953 51.74518809671997)"
3,103,"LINESTRING (18.62085859546164 54.68260569927078, 23.79919884613338 52.69109935160657, 20.89224450041863 54.31252492941253)"
4,104,"LINESTRING (5.606975945670001 51.03729848896978, 6.589396599970826 51.85202912048339, 3.31501148496416 51.34577662473805, 5.988658074577813 51.85161570902505)"
5,105,"LINESTRING (4.799221632515724 49.98537303323637, 6.043073357781111 50.12805166279423, 3.31501148496416 51.34577662473805, 6.15665815595878 50.80372101501058, 3.314971144228537 51.34575511331991)"
6,106,"LINESTRING (3.31501148496416 51.34577662473805, 3.830288527043137 51.62054454203195, 6.905139601274129 53.48216217713065, 4.705997348661185 53.09179840759776)"
7,107,"LINESTRING (7.092053256873896 53.14404328064489, 3.830288527043137 51.62054454203195, 6.842869500362383 52.22844025329755, 3.31501148496416 51.34577662473805)"
8,108,"LINESTRING (6.589396599970826 51.85202912048339, 6.905139601274129 53.48216217713065, 3.314971144228537 51.34575511331991, 5.988658074577813 51.85161570902505)"
9,109,"LINESTRING (5.606975945670001 51.03729848896978, 4.286022983425084 49.90749664977255)"'''
    )
)

# pick first and last point from each linestring as columns
df = df.join(
    df["geometry"]
    .apply(lambda ls: np.array(shapely.wkt.loads(ls).coords)[[0, -1]])
    .apply(
        lambda x: {
            f"{c}_point": shapely.geometry.Point(x[i])
            for i, c in enumerate(df.columns)
            if c != "geometry"
        }
    )
    .apply(pd.Series)
)

# reshape to row wise
df2 = pd.melt(
    df,
    id_vars=["source", "target"],
    value_vars=["source_point", "target_point"],
    value_name="point",
)
df2["node_number"] = np.where(
    df2["variable"] == "source_point", df2["source"], df2["target"]
)
df2 = df2.drop(columns=["source", "target", "variable"])

output

point	node_number
POINT (5.897759230176348 49.44266714130711)	0
POINT (13.59594567226444 48.87717194273715)	1
POINT (16.71947594571444 50.21574656839354)	2
POINT (18.62085859546164 54.68260569927078)	3
POINT (5.606975945670001 51.03729848896978)	4
POINT (4.799221632515724 49.98537303323637)	5
POINT (3.31501148496416 51.34577662473805)	6
POINT (7.092053256873896 53.14404328064489)	7
POINT (6.589396599970826 51.85202912048339)	8
POINT (5.606975945670001 51.03729848896978)	9
POINT (5.674051954784829 49.5294835475575)	100
POINT (6.658229607783568 49.20195831969157)	101
POINT (14.60709842291953 51.74518809671997)	102
POINT (20.89224450041863 54.31252492941253)	103
POINT (5.988658074577813 51.85161570902505)	104
POINT (3.314971144228537 51.34575511331991)	105
POINT (4.705997348661185 53.09179840759776)	106
POINT (3.31501148496416 51.34577662473805)	107
POINT (5.988658074577813 51.85161570902505)	108
POINT (4.286022983425084 49.90749664977255)	109