Matlab - Assigning matching variables between two data sets and creating a new table-CodePudding

I'm currently working with two data sets, as shown below:

Data set 1:

Point_ID	Record	Difference (m)
'2804AJGCA57'	'Record003 - 220428_103738_Scanner_1 - 2804AJGCA57'	'0.035240'
'2804AJGCA28'	'Record003 - 220428_103738_Scanner_1 - 2804AJGCA28'	'0.030961'
'2804AJGCA29'	'Record003 - 220428_103738_Scanner_1 - 2804AJGCA29'	'0.030219'

Data set 2:

Point_ID	Easting	Northing	Elevation_OD
'2804AJGCA1'	'200305.3884'	'80809.76627'	'7.25913'
'2804AJGCA2'	'200304.9855'	'80809.20396'	'7.23274'
'2804AJGCA3'	'200304.3783'	'80808.51888'	'7.20207'

Essentially, I need to compare the 1st column of both tables and if the 'Point_ID' from the 2nd data set is found within the 1st, I need to add the 'Difference (m)' column onto the corresponding 'Point_ID' row, in the 2nd data set, or a new table.

I hope that this makes sense. Currently I have the following code:

%% Import CSVs

GCA_data = readtable('Input/Test/GCA&GCP_Results_Flight1.csv', 'Delimiter',';', 'Format','%s %s %s'); %Insert pathway to GCA_Results csv.

XYZ_data= readtable('Input/Test/GCA&GCP_Flight1.csv','Delimiter',',','Format','%s %s %s %s'); %Insert pathway to the GCA XYZ file inputted into RiProcess.


%% Pre - Settings

ids = GCA_data.Object1; %Identifies all points that were used within the GCA calculations

nids = numel(ids); % Identifies the number of unique point ids.

gca_table = [];

for ii = 1:nids; 
   
    ID = ids{ii}; %Speicifies the point ID. 
    
    idx = ismember()
end

CodePudding user response：

this is tagged as Python so I'll give you a python code solution:

import numpy as np

import pandas as pd

GCA_data=pd.read_csv('GCA_Results csv', sep=';', header=True) XYZ_data=pd.read_csv('GCA&GCP_Flight1.csv', sep=',', header=True)

GCA_data=GCA_data.set_index(GCA_data['Point_ID']) XYZ_data=XYZ_data.set_index(XYZ_data['Point_ID']) Anyname= pd.concat([XYZ_data,GCA_data['Difference (m)']], axis=1).reset_index(drop=True) Anyname.to_csv('Anyname',index=False)

CodePudding user response：

Hope I understood your question correct, as their no match between both table points I am compare by starts match. By first converting the df1's pointID and difference columns to dict.

For ex, 2804AJGCA28 == 2804AJGCA2

Code:

dic = pd.Series(df1['Difference (m)'].values,index=df1['Point_ID']).to_dict()
df2['Difference (m)'] = df2['Point_ID'].apply(lambda x:  [ v for k,v in dic.items() if x in k])
df2

Output:

      Point_ID      Easting    Northing Elevation_OD    Difference (m)
0   2804AJGCA1  200305.3884 80809.76627 7.25913                     []
1   2804AJGCA2  200304.9855 80809.2039  7.23274   [0.030961, 0.030219]
2   2804AJGCA3  200304.3783 80808.51888 7.20207                     []

Otherwise, if you have match between pointID you can remove loop and startswith logic and just

df2['Point_ID'].apply(lambda x: pd.Series(df1['Difference (m)'].values,index=df1['Point_ID']).to_dict()[x] )