I have the DataFrame named data
, and another one called indexes
:
data = pd.DataFrame(
[
[1, 11, 21, 31, 41],
[11, 16, 18, 26, 36],
[21, 26, 30, 40, 51],
[1, 11, 61, 71, 76],
[41, 51, 61, 71, 81],
[1, 2, 3, 4, 11]
]
)
indexes = pd.DataFrame(
{
'col1': [0, 2, 2, 1, 0, 3],
'col2': [0, 2, 2, 1, 0, 4],
'col3': [1, 2, 0, 1, 0, 4],
}
)
Here are the printed values:
0 1 2 3 4
0 1 11 21 31 41
1 11 16 18 26 36
2 21 26 30 40 51
3 1 11 61 71 76
4 41 51 61 71 81
5 1 2 3 4 11
col1 col2 col3
0 0 0 1
1 2 2 2
2 2 2 0
3 1 1 1
4 0 0 0
5 3 4 4
I want to obtain the result
DataFrame (with the dimensions of indexes
), that uses the values of a column (col1, col2, col3
) as indexes when calling data
, returning the corresponding value of the column in data
.
Here is the exact result that I am looking for:
result = pd.DataFrame(
{
'col1': [1, 18, 30, 11, 41, 4],
'col2': [1, 18, 30, 11, 41, 11],
'col3': [11, 18, 21, 11, 41, 11],
}
)
When printed:
col1 col2 col3
0 1 1 11
1 18 18 18
2 30 30 21
3 11 11 11
4 41 41 41
5 4 11 11
I know that it should probably be something along the lines of result = indexes.apply(lambda x: ...)
, and using iloc
somewhere, I just don't know how to put the pieces together.
Thank you
CodePudding user response:
Use lookup by all columns in DataFrame indexes
:
def f(x):
idx, cols = pd.factorize(x)
return data.reindex(cols, axis=1).to_numpy()[np.arange(len(x)), idx]
df = indexes.apply(f)
print (df)
col1 col2 col3
0 1 1 11
1 18 18 18
2 30 30 21
3 11 11 11
4 41 41 41
5 4 11 11
CodePudding user response:
Another possible solution:
def f(x):
return [data.iloc[s] for s in zip(np.arange(len(x)), x)]
indexes.apply(f)
Output:
col1 col2 col3
0 1 1 11
1 18 18 18
2 30 30 21
3 11 11 11
4 41 41 41
5 4 11 11