Home > Enterprise >  Pandas: return first row where column value satisfies condition against a list of values
Pandas: return first row where column value satisfies condition against a list of values

Time:01-09

I have a dataframe df and a list of floats T. df.B is a time series of values sorted in chronological order, where the 0th index is the most recent timestamp and the last index is the oldest timestamp.

df = pd.DataFrame({'A': [1.1, 2.2, 3.3, 4.4], 'B': [5.5, 6.6, 7.7, 8.8]})
T = [t1, t2, ..., tn] # floats

What I am looking to do

I would like to compare the values of column B against the list of values T, one t at a time, and return the first row of df that satisfies the condition against t. By first row of df I mean walk through the timeseries (essentially) and find the first instance in time where the values in df.B become larger than the value t for any t in T.

What I've attempted:

df.loc[df.apply(lambda x: x.B >= T, axis=1)]
# => TypeError: unhashable type: 'numpy.ndarray'

df2 = df.query('B >= @T')
# => 'Lengths must match to compare'

[ df[df['B'] >= t] for t in T ]
# => Technically this works and then I can iterate again to retrieve the first row, but I get the warning -- pydevd warning: Computing repr of a (list) was slow 

EDIT, an example:

T = [3.5, 4.5, 8.0, 8.5, 10.0, 11.0]
df.B = [5.5, 8.8, 6.6, 7.7]

# I'm hoping that the expected output would have the rows corresponding to the following values in `df.B`:
[7.7, 7.7, 8.8, 8.8, None, None]

CodePudding user response:

You can sort values in column B, and then use numpy.searchsorted:

import numpy as np
sorted_values = np.sort(df.B)
np.append(sorted_values, np.nan)[np.searchsorted(sorted_values, T)]
# array([5.5, 5.5, 5.5, 7.7, nan, nan])

To get data frame rows, first find the indices:

indices = np.searchsorted(df.B, T)
indices
# array([0, 0, 0, 2, 4, 4], dtype=int32)

Then retrieve corresponding rows:

[df.iloc[i] if i < len(df) else None for i in indices]

[A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    1.1
B    5.5
Name: 0, dtype: float64, A    3.3
B    7.7
Name: 2, dtype: float64, None, None]
  • Related