lets assume I have pandas dataframe looking like this:
lst = [45.45454545454545, 45.45454545454545, 45.45454545454545, 45.45454545454545, 45.45454545454545, 36.36363636363637, 36.36363636363637, 36.36363636363637, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 36.36363636363637, 0.0, 27.27272727272727, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 27.27272727272727, 54.54545454545454, 27.27272727272727, 36.36363636363637, 36.36363636363637, 54.54545454545454, 36.36363636363637, 45.45454545454545, 45.45454545454545, 36.36363636363637, 36.36363636363637, 45.45454545454545, 45.45454545454545, 36.36363636363637, 45.45454545454545, 36.36363636363637, 45.45454545454545, 36.36363636363637, 45.45454545454545, 36.36363636363637, 36.36363636363637, 36.36363636363637, 0.0, 36.36363636363637, 27.27272727272727, 0.0, 36.36363636363637, 0.0, 36.36363636363637, 36.36363636363637, 0.0, 0.0, 27.27272727272727, 0.0, 36.36363636363637, 0.0, 0.0, 0.0, 0.0, 36.36363636363637, 36.36363636363637, 0.0, 36.36363636363637, 36.36363636363637, 27.27272727272727, 27.27272727272727, 36.36363636363637, 36.36363636363637, 36.36363636363637, 36.36363636363637, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 27.27272727272727, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 27.27272727272727, 36.36363636363637, 0.0, 0.0, 0.0, 0.0, 0.0]
df = pd.DataFrame(lst,columns =['%'])
df.index.name='Time/ps'
df
Now, I'd like to know first occurrence at what "Time/ps" the "%" decreases to 0.0 but only under condition that the 0 (zeroes) are there at least 5 consecutive rows below each other. I tried to do it with this code and it works partially:
for k, v in df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()):
print(f'[group {k}]')
print(v)
print('\n')
However, the problem is that I don't know how to get information when there are 0.0's in the "%"-column at least 5 consecutive rows under each other. This code prints all occurrences and I could scroll it but I want to do it automatically. My desired output would look something like this: Time/ps: 9
Thank You for suggestions
CodePudding user response:
The naive answer could be to take your code and do something like this:
for k, v in df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()):
if len(v) > 5:
print("Time/ps:", k)
break
A nicer way could be like this:
df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()).filter(lambda x: len(x) > 5)
I took you groupby code and then used filter to filter out groups with length less than 5.
This gives this dataframe:
Time/ps
9 0.0
10 0.0
11 0.0
12 0.0
13 0.0
14 0.0
15 0.0
16 0.0
17 0.0
35 0.0
36 0.0
37 0.0
38 0.0
39 0.0
40 0.0
41 0.0
99 0.0
100 0.0
101 0.0
102 0.0
103 0.0
104 0.0
105 0.0
106 0.0
You said you want the first occurrence:
index = df[df['%'] == 0.000000].groupby((df['%'] != 0.000000).cumsum()).filter(lambda x: len(x) > 5).iloc[0].name
print('Time/ps:', index)
# Time/ps: 9
CodePudding user response:
Provided that all values are non-negative, you can use a rolling window sum of length 5 and find the first zero using argmin
:
k = 5
df.index[df['%'].rolling(k).sum().argmin() - k 1]
(if there may be negative values, you need to do .abs()
before rolling()
)
CodePudding user response:
import pandas as pd
import numpy as np
number = 0.0 #45.45454545454545
df = pd.DataFrame(lst, columns=['p']) #use more convenient name of column
df['dif'] = np.abs(df.p.diff(1)) np.abs(df.p.diff(2)) np.abs(df.p.diff(3)) np.abs(df.p.diff(4))
positions = df.index[(df.p==number) & (df.dif==0.0)]
first_index = positions[0] - 4