I'm currently working with two data sets, as shown below:
Data set 1:
Point_ID | Record | Difference (m) |
---|---|---|
'2804AJGCA57' | 'Record003 - 220428_103738_Scanner_1 - 2804AJGCA57' | '0.035240' |
'2804AJGCA28' | 'Record003 - 220428_103738_Scanner_1 - 2804AJGCA28' | '0.030961' |
'2804AJGCA29' | 'Record003 - 220428_103738_Scanner_1 - 2804AJGCA29' | '0.030219' |
Data set 2:
Point_ID | Easting | Northing | Elevation_OD |
---|---|---|---|
'2804AJGCA1' | '200305.3884' | '80809.76627' | '7.25913' |
'2804AJGCA2' | '200304.9855' | '80809.20396' | '7.23274' |
'2804AJGCA3' | '200304.3783' | '80808.51888' | '7.20207' |
Essentially, I need to compare the 1st column of both tables and if the 'Point_ID' from the 2nd data set is found within the 1st, I need to add the 'Difference (m)' column onto the corresponding 'Point_ID' row, in the 2nd data set, or a new table.
I hope that this makes sense. Currently I have the following code:
%% Import CSVs
GCA_data = readtable('Input/Test/GCA&GCP_Results_Flight1.csv', 'Delimiter',';', 'Format','%s %s %s'); %Insert pathway to GCA_Results csv.
XYZ_data= readtable('Input/Test/GCA&GCP_Flight1.csv','Delimiter',',','Format','%s %s %s %s'); %Insert pathway to the GCA XYZ file inputted into RiProcess.
%% Pre - Settings
ids = GCA_data.Object1; %Identifies all points that were used within the GCA calculations
nids = numel(ids); % Identifies the number of unique point ids.
gca_table = [];
for ii = 1:nids;
ID = ids{ii}; %Speicifies the point ID.
idx = ismember()
end
CodePudding user response:
this is tagged as Python so I'll give you a python code solution:
import numpy as np
import pandas as pd
GCA_data=pd.read_csv('GCA_Results csv', sep=';', header=True) XYZ_data=pd.read_csv('GCA&GCP_Flight1.csv', sep=',', header=True)
GCA_data=GCA_data.set_index(GCA_data['Point_ID']) XYZ_data=XYZ_data.set_index(XYZ_data['Point_ID']) Anyname= pd.concat([XYZ_data,GCA_data['Difference (m)']], axis=1).reset_index(drop=True) Anyname.to_csv('Anyname',index=False)
CodePudding user response:
Hope I understood your question correct, as their no match between both table points I am compare by starts match. By first converting the df1's pointID and difference columns to dict.
For ex, 2804AJGCA28 == 2804AJGCA2
Code:
dic = pd.Series(df1['Difference (m)'].values,index=df1['Point_ID']).to_dict()
df2['Difference (m)'] = df2['Point_ID'].apply(lambda x: [ v for k,v in dic.items() if x in k])
df2
Output:
Point_ID Easting Northing Elevation_OD Difference (m)
0 2804AJGCA1 200305.3884 80809.76627 7.25913 []
1 2804AJGCA2 200304.9855 80809.2039 7.23274 [0.030961, 0.030219]
2 2804AJGCA3 200304.3783 80808.51888 7.20207 []
Otherwise, if you have match between pointID you can remove loop and startswith logic and just
df2['Point_ID'].apply(lambda x: pd.Series(df1['Difference (m)'].values,index=df1['Point_ID']).to_dict()[x] )