Home > database >  Use dataframe of indexes to call another dataframe
Use dataframe of indexes to call another dataframe

Time:08-13

I have the DataFrame named data, and another one called indexes:

data = pd.DataFrame(
    [
        [1, 11, 21, 31, 41],
        [11, 16, 18, 26, 36],
        [21, 26, 30, 40, 51],
        [1, 11, 61, 71, 76],
        [41, 51, 61, 71, 81],
        [1, 2, 3, 4, 11]
    ]
)

indexes = pd.DataFrame(
    {
        'col1': [0, 2, 2, 1, 0, 3],
        'col2': [0, 2, 2, 1, 0, 4],
        'col3': [1, 2, 0, 1, 0, 4],
    }
)

Here are the printed values:

    0   1   2   3   4
0   1  11  21  31  41
1  11  16  18  26  36
2  21  26  30  40  51
3   1  11  61  71  76
4  41  51  61  71  81
5   1   2   3   4  11

   col1  col2  col3
0     0     0     1
1     2     2     2
2     2     2     0
3     1     1     1
4     0     0     0
5     3     4     4

I want to obtain the result DataFrame (with the dimensions of indexes), that uses the values of a column (col1, col2, col3) as indexes when calling data, returning the corresponding value of the column in data.

Here is the exact result that I am looking for:

result = pd.DataFrame(
    {
        'col1': [1, 18, 30, 11, 41, 4],
        'col2': [1, 18, 30, 11, 41, 11],
        'col3': [11, 18, 21, 11, 41, 11],
    }
)

When printed:

   col1  col2  col3
0     1     1    11
1    18    18    18
2    30    30    21
3    11    11    11
4    41    41    41
5     4    11    11

I know that it should probably be something along the lines of result = indexes.apply(lambda x: ...), and using iloc somewhere, I just don't know how to put the pieces together.

Thank you

CodePudding user response:

Use lookup by all columns in DataFrame indexes:

def f(x):
    idx, cols = pd.factorize(x)
    return data.reindex(cols, axis=1).to_numpy()[np.arange(len(x)), idx]

df = indexes.apply(f)
print (df)
   col1  col2  col3
0     1     1    11
1    18    18    18
2    30    30    21
3    11    11    11
4    41    41    41
5     4    11    11

CodePudding user response:

Another possible solution:

def f(x):
  return [data.iloc[s] for s in zip(np.arange(len(x)), x)]

indexes.apply(f)

Output:

   col1  col2  col3
0     1     1    11
1    18    18    18
2    30    30    21
3    11    11    11
4    41    41    41
5     4    11    11
  • Related