How to get first occurrence of specific value of one column but only if it's there at least 5 c-CodePudding

lets assume I have pandas dataframe looking like this:

lst = [45.45454545454545, 45.45454545454545, 45.45454545454545, 45.45454545454545, 45.45454545454545, 36.36363636363637, 36.36363636363637, 36.36363636363637, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 36.36363636363637, 0.0, 27.27272727272727, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 27.27272727272727, 54.54545454545454, 27.27272727272727, 36.36363636363637, 36.36363636363637, 54.54545454545454, 36.36363636363637, 45.45454545454545, 45.45454545454545, 36.36363636363637, 36.36363636363637, 45.45454545454545, 45.45454545454545, 36.36363636363637, 45.45454545454545, 36.36363636363637, 45.45454545454545, 36.36363636363637, 45.45454545454545, 36.36363636363637, 36.36363636363637, 36.36363636363637, 0.0, 36.36363636363637, 27.27272727272727, 0.0, 36.36363636363637, 0.0, 36.36363636363637, 36.36363636363637, 0.0, 0.0, 27.27272727272727, 0.0, 36.36363636363637, 0.0, 0.0, 0.0, 0.0, 36.36363636363637, 36.36363636363637, 0.0, 36.36363636363637, 36.36363636363637, 27.27272727272727, 27.27272727272727, 36.36363636363637, 36.36363636363637, 36.36363636363637, 36.36363636363637, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 36.36363636363637, 0.0, 0.0, 0.0, 0.0, 0.0]
df = pd.DataFrame(lst,columns =['%'])
df.index.name='Time/ps'
df

Now, I'd like to know first occurrence at what "Time/ps" the "%" decreases to 0.0 but only under condition that the 0 (zeroes) are there at least 5 consecutive rows below each other. I tried to do it with this code and it works partially:

for k, v in df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()):
    print(f'[group {k}]')
    print(v)
    print('\n')

However, the problem is that I don't know how to get information when there are 0.0's in the "%"-column at least 5 consecutive rows under each other. This code prints all occurrences and I could scroll it but I want to do it automatically. My desired output would look something like this: Time/ps: 9

Thank You for suggestions

CodePudding user response：

The naive answer could be to take your code and do something like this:

for k, v in df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()):
    if len(v) > 5:
       print("Time/ps:", k)
       break

A nicer way could be like this:

df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()).filter(lambda x: len(x) > 5)

I took you groupby code and then used filter to filter out groups with length less than 5.

This gives this dataframe:


    Time/ps     
9       0.0
10      0.0
11      0.0
12      0.0
13      0.0
14      0.0
15      0.0
16      0.0
17      0.0
35      0.0
36      0.0
37      0.0
38      0.0
39      0.0
40      0.0
41      0.0
99      0.0
100     0.0
101     0.0
102     0.0
103     0.0
104     0.0
105     0.0
106     0.0

You said you want the first occurrence:

index = df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()).filter(lambda x: len(x) > 5).iloc[0].name
print('Time/ps:', index)
# Time/ps: 9

CodePudding user response：

Provided that all values are non-negative, you can use a rolling window sum of length 5 and find the first zero using argmin:

k = 5
df.index[df['%'].rolling(k).sum().argmin() - k   1]

(if there may be negative values, you need to do .abs() before rolling())

CodePudding user response：

import pandas as pd
import numpy as np

number = 0.0 #45.45454545454545
df = pd.DataFrame(lst, columns=['p']) #use more convenient name of column
df['dif'] = np.abs(df.p.diff(1))   np.abs(df.p.diff(2))   np.abs(df.p.diff(3))   np.abs(df.p.diff(4))
positions = df.index[(df.p==number) & (df.dif==0.0)]
first_index = positions[0] - 4