I got these 2 dataframes
data = {
'col1': {1: 9, 2: 20, 3: 35, 4: 47, 5: 30, 6: 8},
'col2': {1: 1, 2: 5, 3: 30, 4: 45, 5: 46, 6: 85},
'col3': {1: 11, 2: 18, 3: 22, 4: 27, 5: 32, 6: 90},
}
thresholds = {
0: {1: 0, 2: 10, 3: 20, 4: 0, 5: 40, 6: 0},
1: {1: 10, 2: 15, 3: 25, 4: 10, 5: 50, 6: 1},
2: {1: 20, 2: 17, 3: 29, 4: 60, 5: 60, 6: 2},
3: {1: 30, 2: 25, 3: 39, 4: 70, 5: 70, 6: 3},
4: {1: 40, 2: 35, 3: 50, 4: 75, 5: 80, 6: 10},
}
data = pd.DataFrame(data)
thresholds = pd.DataFrame(thresholds)
where the amount of rows
is the same for both dataframes.
I need to obtain another dataframe that introduces one column of data
(for example 'col1') in thresholds
, and it fits every element of the column in its corresponding position in the row, returning all the indexes.
Here are the dataframes when printed.
data
:
col1 col2 col3
1 9 1 11
2 20 5 18
3 35 30 22
4 47 45 27
5 30 46 32
6 8 85 90
thresholds
:
0 1 2 3 4
1 0 10 20 30 40
2 10 15 17 25 35
3 20 25 29 39 50
4 0 10 60 70 75
5 40 50 60 70 80
6 0 1 2 3 10
result
:
col1 col2 col3
1 1 1 2
2 3 0 3
3 3 3 1
4 2 2 2
5 0 1 0
6 4 5 5
Examples: result['col1'][1]
is 1
because data['col1'][1] --> 9
would be in position 1
when introduced in row 1
of thresholds
: [0, 10, 20, 30, 40]
result['col2'][6]
is 5
because data['col2'][6] --> 85
would be in position 5
when introduced in row 6
of thresholds
: [0, 1, 2, 3, 10]
I am looking for the result
dataframe using pandas
. I found a function in pandas named searchsorted
, and I know that I need to use .apply()
(or .applymap()
), and use lambda
somewhere. I just don't know how to put the pieces together.
for
loops are no good, too slow. The real dataframes are large, with thousands of rows and columns.
Here is the result
dataframe in code (in case needed):
result = pd.DataFrame(
{
'col1': {1: 1, 2: 3, 3: 3, 4: 2, 5: 0, 6: 4},
'col2': {1: 1, 2: 0, 3: 3, 4: 2, 5: 1, 6: 5},
'col3': {1: 2, 2: 3, 3: 1, 4: 2, 5: 0, 6: 5},
}
)
CodePudding user response:
In your case apply the searchsorted
from numpy
result = data.apply(lambda x : pd.Series(np.searchsorted(thresholds.loc[x.name],x)),axis=1)
Out[13]:
0 1 2
1 1 1 2
2 3 0 3
3 3 3 1
4 2 2 2
5 0 1 0
6 4 5 5