I have 2 dataframes, with df1
containing the coordinates of points and df2
containing the start and end coordinates of lines. Such as:
df1
point | lon | lat |
---|---|---|
P01 | 3 | 48 |
P02 | 9 | 44 |
df2
line | lon.start | lat.start | lon.end | lat.end |
---|---|---|---|---|
L01 | 3 | 48 | 3 | 49 |
I would like to replace the line
variable in df2
with point.start
and point.end
, conditionally creating the column point.start
based on lon.start
and lat.start
, and the same for point.end
. Such that it becomes:
point.start | point.end | lon.start | lat.start | lon.end | lat.end |
---|---|---|---|---|---|
P01 | P05 | 3 | 48 | 3 | 49 |
How could I do this?
CodePudding user response:
Using merge may help you here.
# generate sample data
df1 = pd.DataFrame([['P01', 3, 48], ['P02', 9, 44], ['P03', 3, 49]], columns=('point', 'lon', 'lat'))
df2 = pd.DataFrame([['L01', 3, 48, 3, 49], ['L02', 4, 48, 4, 49]], columns=('line', 'lon.start', 'lat.start', 'lon.end', 'lat.end'))
# handle start points
dfx = pd.merge(df1.rename(columns={'lon': 'lon.start', 'lat': 'lat.start', 'point': 'point.start'}),
df2[['line', 'lon.start', 'lat.start']],
on=['lon.start', 'lat.start'])
# handle end points
dfy = pd.merge(df1.rename(columns={'lon': 'lon.end', 'lat': 'lat.end', 'point': 'point.end'}),
df2[['line', 'lon.end', 'lat.end']],
on=['lon.end', 'lat.end'])
# merge both based on lines
df = pd.merge(dfx, dfy, on='line')
The provided answer will discard lines which have missing points. If you'd rather keep them and not change the shape of df2, replace the last merge by
df = pd.merge(dfx, dfy, on='line', how='outer')
CodePudding user response:
You could do:
df3 = df2.drop(columns="line")
for suffix in ("start", "end"):
df3 = df3.merge(
df1.rename(columns={col: f"{col}.{suffix}" for col in df1.columns}),
on=[f"{col}.{suffix}" for col in ("lon", "lat")],
how="left"
)
If the order of the columns is important then you have reorder them afterwards:
df3 = df3[[f"{col}.{suffix}" for col in df1.columns for suffix in ("start", "end")]]