I have the following dataframe and list of values:
df = pd.DataFrame({"A": [1, 4, 6, 2, 7, 4]}, index=pd.date_range(start='1/1/2020', end='1/06/2020'))
values_to_find = [4, 6, 2]
A
2020-01-01 1
2020-01-02 4
2020-01-03 6
2020-01-04 2
2020-01-05 7
2020-01-06 4
I want to get the index (iloc) of the first value 4 in the dataframe if the next two rows in the dataframe has values 6 and 2. In this case, the desired output is 2 as it is the second index (iloc) in the dataframe.
I tried using np.flatnonzero(df['A'] == values_to_find)
but np.flatnonzero()
only works with a single value, otherwise values_to_find
must be of the same size as the dataframe.
CodePudding user response:
We could do this in two steps:
Find where the sequence of values happen. We'll use
df.rolling
to iterate a window of 3 consecutive rows. Then, we'll use.apply
to find where the condition is True (is this array of values equal to the sequence we're looking for?). Consider thatvalues_to_find
should be anp.array
(np.array(values_to_find)
):>>> df_condition = df.rolling(3).apply(lambda g: all(g.astype(int) == values_to_find), raw=False) >>> df_condition A 2020-01-01 NaN 2020-01-02 NaN 2020-01-03 0.0 2020-01-04 1.0 2020-01-05 0.0 2020-01-06 0.0
Find the index where the condition is True for the first time. We'll just compare the values in the "A" column to 1 and locate the index of the first row that matches this condition:
>>> df.loc[df_condition["A"] == 1].iloc[0] A 2 Name: 2020-01-04 00:00:00, dtype: int64
CodePudding user response:
Looking at this solution, the following function may help you:
def find_sub_loc(x,y):
occ = [i for i, a in enumerate(x) if a == y[0]]
for b in occ:
if x[b:b len(y)] == y:
return (b 1)
break
if len(occ)-1 == occ.index(b):
return -1
break
Example: In your case find_sub_loc(list(df.A.values), values_to_find)
will return the index value 2