Home > OS >  Pandas: searchsorted on a list field on dataframe
Pandas: searchsorted on a list field on dataframe

Time:07-14

I have a pandas df with a sorted list int column like this:

>>> test
                          timestamp  
0  [1, 2, 3, 4]  
1  [1, 3, 5, 7]  
2  [2, 4, 6, 8]  
3  [1, 5, 5, 5]  
4  [3, 4, 5, 6]  

I want to look for the element earlier to a constant value. Basically, if passing the constant value 5, I should get a df additional column like this:

res
3
1
2
0
1

I'm trying with searchsorted, but I'm not able to make it work:

test['res'] =  np.searchsorted(test['timestamp'][test.index] , 5)
...
TypeError: '<' not supported between instances of 'list' and 'int'

If I pass the actual column, it works with the index 0:

np.searchsorted(test['timestamp'][0] , 5)
3

But I'm not able to figure out how to pass the correct index in to make it work:

test['res'] =  np.searchsorted(test['timestamp'][test.index] , 5)
...
TypeError: '<' not supported between instances of 'list' and 'int'

Also put the index on a column x, and tried referencing it like this to no avail:

test['x'] = test.index
test['res'] =  np.searchsorted(test['timestamp'][test['x']] , 5)
...
TypeError: '<' not supported between instances of 'list' and 'int'

How can I use searchsorted in this scenario?

CodePudding user response:

I think you're simply looking for pd.Series.apply

df = pd.DataFrame({
    'timestamp':[
        [1, 2, 3, 4],  
        [1, 3, 5, 7], 
        [2, 4, 6, 8],
        [1, 5, 5, 5],
        [3, 4, 5, 6]
    ]})

df['res'] = df['timestamp'].apply(lambda x: np.searchsorted(x, 5))

print(df)

Output

      timestamp  res
0  [1, 2, 3, 4]    4
1  [1, 3, 5, 7]    2
2  [2, 4, 6, 8]    2
3  [1, 5, 5, 5]    1
4  [3, 4, 5, 6]    2
  • Related