I have a pandas df with a sorted list int column like this:
>>> test
timestamp
0 [1, 2, 3, 4]
1 [1, 3, 5, 7]
2 [2, 4, 6, 8]
3 [1, 5, 5, 5]
4 [3, 4, 5, 6]
I want to look for the element earlier to a constant value. Basically, if passing the constant value 5, I should get a df additional column like this:
res
3
1
2
0
1
I'm trying with searchsorted, but I'm not able to make it work:
test['res'] = np.searchsorted(test['timestamp'][test.index] , 5)
...
TypeError: '<' not supported between instances of 'list' and 'int'
If I pass the actual column, it works with the index 0:
np.searchsorted(test['timestamp'][0] , 5)
3
But I'm not able to figure out how to pass the correct index in to make it work:
test['res'] = np.searchsorted(test['timestamp'][test.index] , 5)
...
TypeError: '<' not supported between instances of 'list' and 'int'
Also put the index on a column x
, and tried referencing it like this to no avail:
test['x'] = test.index
test['res'] = np.searchsorted(test['timestamp'][test['x']] , 5)
...
TypeError: '<' not supported between instances of 'list' and 'int'
How can I use searchsorted in this scenario?
CodePudding user response:
I think you're simply looking for pd.Series.apply
df = pd.DataFrame({
'timestamp':[
[1, 2, 3, 4],
[1, 3, 5, 7],
[2, 4, 6, 8],
[1, 5, 5, 5],
[3, 4, 5, 6]
]})
df['res'] = df['timestamp'].apply(lambda x: np.searchsorted(x, 5))
print(df)
Output
timestamp res
0 [1, 2, 3, 4] 4
1 [1, 3, 5, 7] 2
2 [2, 4, 6, 8] 2
3 [1, 5, 5, 5] 1
4 [3, 4, 5, 6] 2