I have 2 csv files containing bathymetry data - single and multibeam
CSV 1 which is the multibeam data looks like:
X | Y | Z |
---|---|---|
626066.4 | 234058.2 | 6.69 |
626066.4 | 234059.2 | 6.89 |
626066.4 | 234060.2 | 7.06 |
And CSV 2 which contains the singlebeam data looks like:
x | Y | A | B | C |
---|---|---|---|---|
627839 | 232463.4 | 14.22 | 14.46 | 14.71 |
627839 | 232463.1 | 14.22 | 14.46 | 14.71 |
I would like to use pandas to merge the 2 CSVs based as follows:
- The X and Y coordinates of CSV2 are combined with the pre-existing X,Y coordinates of CSV 1
- The specified Z value of CSV 2 (A, B or C) is combined with CSV 1 Z value
This way I will have a combined xyz dataset of a survey area which will evidentally be more accurate due to a combination of multi and single beam data.
For clarification, what I want to end up with results wise is:
X | Y | Z |
---|---|---|
626066.4 | 234058.2 | 6.69 |
626066.4 | 234059.2 | 6.89 |
626066.4 | 234060.2 | 7.06 |
627839 | 232463.4 | 14.22 |
627839 | 232463.1 | 14.22 |
I have tried the below code snippet, but need a way of combining the X's, Y's and Z's based on specified columns of CSV 2.
import pandas as pd
# Read the files into two dataframes.
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')
# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = 'X')
df3.set_index('X', inplace = True)
# Write it to a new CSV file
df3.to_csv('CSV3.csv')
CodePudding user response:
IIUC you want to be able to tell you function which column is to be handled as Z
in df2
and then concat both lists:
def concat_df_on_z(df1, df2, z_col):
df2 = df2[['X', 'Y', z_col]].rename(columns={z_col: 'Z'})
return pd.concat([df1, df2])
df3 = concat_df_on_z(df1, df2, 'B')
print(df3)
Output:
X Y Z
0 626066.4 234058.2 6.69
1 626066.4 234059.2 6.89
2 626066.4 234060.2 7.06
0 627839.0 232463.4 14.46
1 627839.0 232463.1 14.46