Home > OS >  Find the first index of consecutive values in correct order in a pandas dataframe
Find the first index of consecutive values in correct order in a pandas dataframe

Time:03-03

I have the following dataframe and list of values:

df = pd.DataFrame({"A": [1, 4, 6, 2, 7, 4]}, index=pd.date_range(start='1/1/2020', end='1/06/2020'))
values_to_find = [4, 6, 2]
            A
2020-01-01  1
2020-01-02  4
2020-01-03  6
2020-01-04  2
2020-01-05  7
2020-01-06  4

I want to get the index (iloc) of the first value 4 in the dataframe if the next two rows in the dataframe has values 6 and 2. In this case, the desired output is 2 as it is the second index (iloc) in the dataframe.

I tried using np.flatnonzero(df['A'] == values_to_find) but np.flatnonzero() only works with a single value, otherwise values_to_find must be of the same size as the dataframe.

CodePudding user response:

We could do this in two steps:

  1. Find where the sequence of values happen. We'll use df.rolling to iterate a window of 3 consecutive rows. Then, we'll use .apply to find where the condition is True (is this array of values equal to the sequence we're looking for?). Consider that values_to_find should be a np.array (np.array(values_to_find)):

    >>> df_condition = df.rolling(3).apply(lambda g: all(g.astype(int) == values_to_find), raw=False)
    >>> df_condition
                  A
    2020-01-01  NaN
    2020-01-02  NaN
    2020-01-03  0.0
    2020-01-04  1.0
    2020-01-05  0.0
    2020-01-06  0.0
    
  2. Find the index where the condition is True for the first time. We'll just compare the values in the "A" column to 1 and locate the index of the first row that matches this condition:

    >>> df.loc[df_condition["A"] == 1].iloc[0]
    A    2
    Name: 2020-01-04 00:00:00, dtype: int64
    

CodePudding user response:

Looking at this solution, the following function may help you:

def find_sub_loc(x,y):
  occ = [i for i, a in enumerate(x) if a == y[0]]
  for b in occ:
        if x[b:b len(y)] == y:
            return (b 1)
            break
        if len(occ)-1 ==  occ.index(b):
            return -1
            break

Example: In your case find_sub_loc(list(df.A.values), values_to_find) will return the index value 2

  • Related