Home > OS >  Apply custom function to 2 dataframes in pandas
Apply custom function to 2 dataframes in pandas

Time:08-11

I got these 2 dataframes

data = {
    'col1': {1: 9, 2: 20, 3: 35, 4: 47, 5: 30, 6: 8},
    'col2': {1: 1, 2: 5, 3: 30, 4: 45, 5: 46, 6: 85},
    'col3': {1: 11, 2: 18, 3: 22, 4: 27, 5: 32, 6: 90},
}

thresholds = {
    0: {1: 0, 2: 10, 3: 20, 4: 0, 5: 40, 6: 0},
    1: {1: 10, 2: 15, 3: 25, 4: 10, 5: 50, 6: 1},
    2: {1: 20, 2: 17, 3: 29, 4: 60, 5: 60, 6: 2},
    3: {1: 30, 2: 25, 3: 39, 4: 70, 5: 70, 6: 3},
    4: {1: 40, 2: 35, 3: 50, 4: 75, 5: 80, 6: 10},
}

data = pd.DataFrame(data)
thresholds = pd.DataFrame(thresholds)

where the amount of rows is the same for both dataframes.

I need to obtain another dataframe that introduces one column of data (for example 'col1') in thresholds, and it fits every element of the column in its corresponding position in the row, returning all the indexes.

Here are the dataframes when printed.

data:

   col1  col2  col3
1     9     1    11
2    20     5    18
3    35    30    22
4    47    45    27
5    30    46    32
6     8    85    90

thresholds:

    0   1   2   3   4
1   0  10  20  30  40
2  10  15  17  25  35
3  20  25  29  39  50
4   0  10  60  70  75
5  40  50  60  70  80
6   0   1   2   3  10

result:

   col1  col2  col3
1     1     1     2
2     3     0     3
3     3     3     1
4     2     2     2
5     0     1     0
6     4     5     5

Examples: result['col1'][1] is 1 because data['col1'][1] --> 9 would be in position 1 when introduced in row 1 of thresholds: [0, 10, 20, 30, 40]

result['col2'][6] is 5 because data['col2'][6] --> 85 would be in position 5 when introduced in row 6 of thresholds: [0, 1, 2, 3, 10]

I am looking for the result dataframe using pandas. I found a function in pandas named searchsorted, and I know that I need to use .apply() (or .applymap()), and use lambda somewhere. I just don't know how to put the pieces together.

for loops are no good, too slow. The real dataframes are large, with thousands of rows and columns.

Here is the result dataframe in code (in case needed):

result = pd.DataFrame(
    {
        'col1': {1: 1, 2: 3, 3: 3, 4: 2, 5: 0, 6: 4},
        'col2': {1: 1, 2: 0, 3: 3, 4: 2, 5: 1, 6: 5},
        'col3': {1: 2, 2: 3, 3: 1, 4: 2, 5: 0, 6: 5},
    }
)

CodePudding user response:

In your case apply the searchsorted from numpy

result = data.apply(lambda x : pd.Series(np.searchsorted(thresholds.loc[x.name],x)),axis=1)
Out[13]: 
   0  1  2
1  1  1  2
2  3  0  3
3  3  3  1
4  2  2  2
5  0  1  0
6  4  5  5
  • Related