Home > Software engineering >  Correct use of numpy searchsorted routine
Correct use of numpy searchsorted routine

Time:10-26

I have an input DataFrame which I want to modify one of its 'Spcx' columns, for this I have defined an ascending sorted list 'Spaces'

import pandas as pd
import numpy as np
if __name__ == "__main__":

    Example = [[   0],
               [   0],
               [0.14],
               [0.10],
               [0.10],
               [0.10],
               [0.13],
               [0.16],
               [0.24],
               [0.21],
               [0.14],
               [0.14]]
            
    Example = pd.DataFrame(data = Example, columns = ['Spcx'])
    
    Spaces = [0, 0.100, 0.125, 0.150, 0.175, 0.200, 0.225, 0.250, 0.275, 0.300]

    Spaces = np.array(Spaces)  # convert to numpy array
    Example["Spcx"] = Spaces[np.searchsorted(Spaces, Example["Spcx"], side = 'left')]

What I am looking for is that each Example ['Spcx'] is compared with each interval of 'Spaces' and take the value on the left, for example:

0 - -> Spaces [0 - 0.100] - -> 0

0.10 - -> Spaces [0.100 - 0.125] - -> 0.100

0.14 - -> Spaces [0.125 - 0.150] - -> 0.125

It should stay like this:

Spcx    
0       
0       
0.125   
0.1     
0.1        
0.1      
0.125   
0.15
0.225      
0.2        
0.125   
0.125   

CodePudding user response:

One approach, is simply to do use side='right' and subtract 1:

Spaces = np.array(Spaces)  # convert to numpy array
Example["Spcx"] = Spaces[np.searchsorted(Spaces, Example["Spcx"], side='right') - 1]
print(Example)

Output

     Spcx
0   0.000
1   0.000
2   0.125
3   0.100
4   0.100
5   0.100
6   0.125
7   0.150
8   0.225
9   0.200
10  0.125
11  0.125

From the documentation on np.searchsorted, assuming that a is sorted array, it will return:

side returned index i satisfies
left a[i-1] < v <= a[i]
right a[i-1] <= v < a[i]

Basically "right" will return i such that i - 1 corresponds to the last value that less or equal to the one is searching.

CodePudding user response:

You can try pd.cut with right option:

Spaces[pd.cut(Example['Spcx'], Spaces, right=False, labels=False)]

Output:

array([0.   , 0.   , 0.125, 0.1  , 0.1  , 0.1  , 0.125, 0.15 , 0.225,
       0.2  , 0.125, 0.125])

CodePudding user response:

You want the numpy digitize function:

import pandas as pd
import numpy as np
if __name__ == "__main__":

    Example = [[   0],
               [   0],
               [0.14],
               [0.10],
               [0.10],
               [0.10],
               [0.13],
               [0.16],
               [0.24],
               [0.21],
               [0.14],
               [0.14]]
            
    Example = pd.DataFrame(data = Example, columns = ['Spcx'])
    
    Spaces = [0, 0.100, 0.125, 0.150, 0.175, 0.200, 0.225, 0.250, 0.275, 0.300]

    Spaces = np.array(Spaces)  # convert to numpy array
    Example["Spcx"] = Spaces[np.digitize(Example["Spcx"],Spaces)-1]
    print(Example)

Output:

     Spcx
0   0.000
1   0.000
2   0.125
3   0.100
4   0.100
5   0.100
6   0.125
7   0.150
8   0.225
9   0.200
10  0.125
11  0.125
  • Related