I have a pandas dataframe with some points and their coordinates (X, Y, Z), following this structure:
>>> inFile
X Y Z
0 728049.8355 4.395285e 06 201.3366
1 728049.9077 4.395285e 06 201.3108
2 728049.9014 4.395285e 06 201.3106
3 728049.9788 4.395285e 06 201.2823
Together with a numpy array that contains the index number of the neighbours of each point:
>>> indices_Neighbours
array([array([3], dtype=int64),
array([2,3], dtype=int64),
array([3], dtype=int64),
array([0], dtype=int64)])
My objective is to create a new column in the df that contains for each point an array with the X, Y, Z coordinates of its neighbouring points
>>> inFile
X Y Z Neighbours_Coordinates
0 728049.8355 4.395285e 06 201.3366 [[728049.9788,4.395285e 06,201.2823]]
1 728049.9077 4.395285e 06 201.3108 [[728049.9014,4.395285e 06,201.3106],[728049.9788,4.395285e 06,201.2823]]
2 728049.9014 4.395285e 06 201.3106 [[728049.9788,4.395285e 06,201.2823]]
3 728049.9788 4.395285e 06 201.2823 [[728049.8355,4.395285e 06,201.3366]]
Due to the fact that the coordinate file is quite large (several gb), I am trying not to do a for loop. I am thinking in a similar solution to this one instead, but it doesn't work for me:
inFile['Neighbours_Coordinates'] =inFile.apply(lambda x: np.array(inFile.X.iloc[x.indices_Neighbors],inFile.Y.iloc[x.indices_Neighbors],inFile.Z.iloc[x.indices_Neighbors]), axis=1)
CodePudding user response:
Finally the best solution I found was to insert the numpy array as a column in the dataframe (indices_Neighbors), and using the function stack of numpy:
inFile['Neighbours_Coordinates'] =inFile.apply(lambda x: np.stack([inFile.X.iloc[x.indices_Neighbors],inFile.Y.iloc[x.indices_Neighbors],inFile.Z.iloc[x.indices_Neighbors]], axis=1), axis=1)