Assign value on large pandas dataframe based on index stored on two others dataframes-CodePudding

I have three dataframes that I am comparing, where I have stored several data, one where is the information of my interest, which is the one I want to complete. The second one where is the column with the coordinates that I want to add to my general dataframe and the third one where are stored the indexes of the two previous dataframes where the values correspond.

It is a little confusing, but I put an example where you can see it better:

Dataframe 1:

index	n_tree
247	1
248	2

Dataframe 2:

index	coords
1400	(20,47)
1401	(30,85)

dataframe 3:

index	index_dataframe_1	index_dataframe_2
0	247	1401

My intention is that my general dataframe contains the correct coordinate column. as follow:

index	n_tree	coords
247	1	(30,85)

I have tried to assign it with .iloc, .loc, .at but I get the following error:

 for idx, rw in dataframe_3.iterrows():
        coords = dataframe_1.loc[rw.index_dataframe_2, "coords"]
        dataframe_2.loc[int(rw.index_dataframe_1), "coords"] = coords

ValueError: Must have equal len keys and value when setting with an iterable.

CodePudding user response：

You can perform two merges:

(df3.merge(df1, left_on='index_dataframe_1', right_index=True)
    .merge(df2, left_on='index_dataframe_2', right_index=True)
    [['n_tree', 'coords']]
)

output:

       n_tree   coords
index                 
0           1  (30,85)

inputs:

>>> df1
       n_tree
index        
247         1
248         2

>>> df2
        coords
index         
1400   (20,47)
1401   (30,85)

>>> df3
       index_dataframe_1  index_dataframe_2
index                                      
0                    247               1401

CodePudding user response：

Use 2 inner joins by .merge():

(Assuming index in your dataframes are data columns instead of row indexes):

df_out = (df1.merge(df3, left_on='index', right_on='index_dataframe_1', suffixes=('', '_y'))       
             .merge(df2, left_on='index_dataframe_2', right_on='index', suffixes=('', '_z'))
          )


df_out = df_out[['index', 'n_tree', 'coords']]

Result:

print(df_out)


   index  n_tree   coords
0    247       1  (30,85)

CodePudding user response：

I think this could work for you:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'index': [247, 248], 'n_tree': [1, 2]}).set_index('index')
df2 = pd.DataFrame({'index': [1400, 1401], 'coords': [(20,47), (30,85)]}).set_index('index')
df3 = pd.DataFrame({'index': [0], 'index_dataframe_1': [247], 'index_dataframe_2': [1401]}).set_index('index')

mapping = dict(zip(df3.index_dataframe_1, df3.index_dataframe_2))

l = list()
for i in df1.index:
    m = mapping.get(i, np.nan)
    if m is not np.nan:
        l.append(df2.at[m, 'coords'])
    else:
        l.append(np.nan)
df1['coords'] = l

print(df1)

Result:

       n_tree    coords
index                  
247         1  (30, 85)
248         2       NaN