I have three dataframes that I am comparing, where I have stored several data, one where is the information of my interest, which is the one I want to complete. The second one where is the column with the coordinates that I want to add to my general dataframe and the third one where are stored the indexes of the two previous dataframes where the values correspond.
It is a little confusing, but I put an example where you can see it better:
Dataframe 1:
index | n_tree |
---|---|
247 | 1 |
248 | 2 |
Dataframe 2:
index | coords |
---|---|
1400 | (20,47) |
1401 | (30,85) |
dataframe 3:
index | index_dataframe_1 | index_dataframe_2 |
---|---|---|
0 | 247 | 1401 |
My intention is that my general dataframe contains the correct coordinate column. as follow:
index | n_tree | coords |
---|---|---|
247 | 1 | (30,85) |
I have tried to assign it with .iloc, .loc, .at but I get the following error:
for idx, rw in dataframe_3.iterrows():
coords = dataframe_1.loc[rw.index_dataframe_2, "coords"]
dataframe_2.loc[int(rw.index_dataframe_1), "coords"] = coords
ValueError: Must have equal len keys and value when setting with an iterable.
CodePudding user response:
You can perform two merges:
(df3.merge(df1, left_on='index_dataframe_1', right_index=True)
.merge(df2, left_on='index_dataframe_2', right_index=True)
[['n_tree', 'coords']]
)
output:
n_tree coords
index
0 1 (30,85)
inputs:
>>> df1
n_tree
index
247 1
248 2
>>> df2
coords
index
1400 (20,47)
1401 (30,85)
>>> df3
index_dataframe_1 index_dataframe_2
index
0 247 1401
CodePudding user response:
Use 2 inner joins by .merge()
:
(Assuming index
in your dataframes are data columns instead of row indexes):
df_out = (df1.merge(df3, left_on='index', right_on='index_dataframe_1', suffixes=('', '_y'))
.merge(df2, left_on='index_dataframe_2', right_on='index', suffixes=('', '_z'))
)
df_out = df_out[['index', 'n_tree', 'coords']]
Result:
print(df_out)
index n_tree coords
0 247 1 (30,85)
CodePudding user response:
I think this could work for you:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'index': [247, 248], 'n_tree': [1, 2]}).set_index('index')
df2 = pd.DataFrame({'index': [1400, 1401], 'coords': [(20,47), (30,85)]}).set_index('index')
df3 = pd.DataFrame({'index': [0], 'index_dataframe_1': [247], 'index_dataframe_2': [1401]}).set_index('index')
mapping = dict(zip(df3.index_dataframe_1, df3.index_dataframe_2))
l = list()
for i in df1.index:
m = mapping.get(i, np.nan)
if m is not np.nan:
l.append(df2.at[m, 'coords'])
else:
l.append(np.nan)
df1['coords'] = l
print(df1)
Result:
n_tree coords
index
247 1 (30, 85)
248 2 NaN