Use pandas to combine 2 CSV files-CodePudding

I have 2 csv files containing bathymetry data - single and multibeam

CSV 1 which is the multibeam data looks like:

X	Y	Z
626066.4	234058.2	6.69
626066.4	234059.2	6.89
626066.4	234060.2	7.06

And CSV 2 which contains the singlebeam data looks like:

x	Y	A	B	C
627839	232463.4	14.22	14.46	14.71
627839	232463.1	14.22	14.46	14.71

I would like to use pandas to merge the 2 CSVs based as follows:

The X and Y coordinates of CSV2 are combined with the pre-existing X,Y coordinates of CSV 1
The specified Z value of CSV 2 (A, B or C) is combined with CSV 1 Z value

This way I will have a combined xyz dataset of a survey area which will evidentally be more accurate due to a combination of multi and single beam data.

For clarification, what I want to end up with results wise is:

X	Y	Z
626066.4	234058.2	6.69
626066.4	234059.2	6.89
626066.4	234060.2	7.06
627839	232463.4	14.22
627839	232463.1	14.22

I have tried the below code snippet, but need a way of combining the X's, Y's and Z's based on specified columns of CSV 2.

import pandas as pd

# Read the files into two dataframes.
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = 'X')
df3.set_index('X', inplace = True)

# Write it to a new CSV file
df3.to_csv('CSV3.csv')

CodePudding user response：

IIUC you want to be able to tell you function which column is to be handled as Z in df2 and then concat both lists:

def concat_df_on_z(df1, df2, z_col):
    df2 = df2[['X', 'Y', z_col]].rename(columns={z_col: 'Z'})
    return pd.concat([df1, df2])

df3 = concat_df_on_z(df1, df2, 'B')
print(df3)

Output:

          X         Y      Z
0  626066.4  234058.2   6.69
1  626066.4  234059.2   6.89
2  626066.4  234060.2   7.06
0  627839.0  232463.4  14.46
1  627839.0  232463.1  14.46