count sequence of negative values in a pandas column and keep going if one index between sequences i-CodePudding

I have a two lists with a sequence of negative counts in a dataframe following this example. One represents soil moisture readings, and the other counts how many values in the list are negative in a sequence. If the value is above 0, it returns 0 and starts over when a new sequence of negatives begins.

I take these lists and turn them into a dataframe like so:

soilMoisture = [-0.017, -0.015, -0.15, -0.020, -0.040, 0.11, -0.005, -0.050]
sequenceNeg = [1,2,3,4,5,0,1,2]
df = pd.DataFrame()
df['soilMoisture'] = soilMoisture
df['sequenceNeg'] = sequenceNeg

However, I would like to still count the 0 in-between the sequences if the number to the left and the right are also negative. For example, I would like sequence negative to be the following:

soilMoisture = [-0.017, -0.015, -0.15, -0.020, -0.040, 0.11, -0.005, -0.050]
sequenceNeg = [1,2,3,4,5,6,7,8]

However, I would still like these sequences to be seperated if there are two zeros in-between. For example, given another list like so:

soilMoisture = [-0.16, -0.7, -0.06, -0.07, -0.04, 0.11, 0.51, -0.09, -0.02]
sequenceNeg = [1,2,3,4,5,0,0,1,2]

How do I write a script that would count the number as negative if the index directly to the left and the right are still negative, but maintain the negative sequence count if there are two non-negatives in a row?

Thanks

CodePudding user response：

Here's an approach that identifies consecutive positive values by checking the prior (shift(1)) and subsequent (shift(-1)) values. Once we have consecutive positives, we use those to separate the values into "groups" which from your example would be three groups, first group (g1) are the values before the consecutive positives, g2 are the consecutive positives, g3 are after (single non-consecutive positive values do not create their own groups)

[-0.16, 0.7, -0.06, -0.07, -0.04, 0.11, 0.51, -0.09, -0.02]
    g1   g1     g1     g1     g1    g2    g2     g3     g3

Then use groupby and cumcount to make a cumsum that resets within each group. Finally set the consecutive positive groups to 0

import pandas as pd

df = pd.DataFrame({
    'soilMoisture':[-0.017, 0.015, -0.15, -0.020, -0.040, 0.11, 0.005, -0.050]
})

#find consecutive positives (curr value must be pos, and either before or after value)
is_pos = df['soilMoisture'].ge(0)
consec_pos = (is_pos & is_pos.shift(1)) | (is_pos & is_pos.shift(-1))

#group values by changes in consec positives
consec_group = consec_pos.ne(consec_pos.shift(1)).cumsum()

#count consecutive negatives within each group, then set consec_pos groups to 0
#(I think this can be done w/out a groupby but I can't figure it out)
df['sequenceNeg'] = consec_pos.groupby(consec_group).cumcount().add(1)
df.loc[consec_pos,'sequenceNeg'] = 0

print(df)

Output

   soilMoisture  sequenceNeg
0        -0.017            1
1         0.015            2
2        -0.150            3
3        -0.020            4
4        -0.040            5
5         0.110            0
6         0.005            0
7        -0.050            1