I have an input DataFrame which I want to modify one of its 'Spcx' columns, for this I have defined an ascending sorted list 'Spaces'
import pandas as pd
import numpy as np
if __name__ == "__main__":
Example = [[ 0],
[ 0],
[0.14],
[0.10],
[0.10],
[0.10],
[0.13],
[0.16],
[0.24],
[0.21],
[0.14],
[0.14]]
Example = pd.DataFrame(data = Example, columns = ['Spcx'])
Spaces = [0, 0.100, 0.125, 0.150, 0.175, 0.200, 0.225, 0.250, 0.275, 0.300]
Spaces = np.array(Spaces) # convert to numpy array
Example["Spcx"] = Spaces[np.searchsorted(Spaces, Example["Spcx"], side = 'left')]
What I am looking for is that each Example ['Spcx'] is compared with each interval of 'Spaces' and take the value on the left, for example:
0 - -> Spaces [0 - 0.100] - -> 0
0.10 - -> Spaces [0.100 - 0.125] - -> 0.100
0.14 - -> Spaces [0.125 - 0.150] - -> 0.125
It should stay like this:
Spcx
0
0
0.125
0.1
0.1
0.1
0.125
0.15
0.225
0.2
0.125
0.125
CodePudding user response:
One approach, is simply to do use side='right'
and subtract 1:
Spaces = np.array(Spaces) # convert to numpy array
Example["Spcx"] = Spaces[np.searchsorted(Spaces, Example["Spcx"], side='right') - 1]
print(Example)
Output
Spcx
0 0.000
1 0.000
2 0.125
3 0.100
4 0.100
5 0.100
6 0.125
7 0.150
8 0.225
9 0.200
10 0.125
11 0.125
From the documentation on np.searchsorted
, assuming that a
is sorted array, it will return:
side | returned index i satisfies |
---|---|
left | a[i-1] < v <= a[i] |
right | a[i-1] <= v < a[i] |
Basically "right"
will return i
such that i - 1
corresponds to the last value that less or equal to the one is searching.
CodePudding user response:
You can try pd.cut
with right
option:
Spaces[pd.cut(Example['Spcx'], Spaces, right=False, labels=False)]
Output:
array([0. , 0. , 0.125, 0.1 , 0.1 , 0.1 , 0.125, 0.15 , 0.225,
0.2 , 0.125, 0.125])
CodePudding user response:
You want the numpy digitize
function:
import pandas as pd
import numpy as np
if __name__ == "__main__":
Example = [[ 0],
[ 0],
[0.14],
[0.10],
[0.10],
[0.10],
[0.13],
[0.16],
[0.24],
[0.21],
[0.14],
[0.14]]
Example = pd.DataFrame(data = Example, columns = ['Spcx'])
Spaces = [0, 0.100, 0.125, 0.150, 0.175, 0.200, 0.225, 0.250, 0.275, 0.300]
Spaces = np.array(Spaces) # convert to numpy array
Example["Spcx"] = Spaces[np.digitize(Example["Spcx"],Spaces)-1]
print(Example)
Output:
Spcx
0 0.000
1 0.000
2 0.125
3 0.100
4 0.100
5 0.100
6 0.125
7 0.150
8 0.225
9 0.200
10 0.125
11 0.125