I have a two lists with a sequence of negative counts in a dataframe following this example. One represents soil moisture readings, and the other counts how many values in the list are negative in a sequence. If the value is above 0, it returns 0 and starts over when a new sequence of negatives begins.
I take these lists and turn them into a dataframe like so:
soilMoisture = [-0.017, -0.015, -0.15, -0.020, -0.040, 0.11, -0.005, -0.050]
sequenceNeg = [1,2,3,4,5,0,1,2]
df = pd.DataFrame()
df['soilMoisture'] = soilMoisture
df['sequenceNeg'] = sequenceNeg
However, I would like to still count the 0 in-between the sequences if the number to the left and the right are also negative. For example, I would like sequence negative to be the following:
soilMoisture = [-0.017, -0.015, -0.15, -0.020, -0.040, 0.11, -0.005, -0.050]
sequenceNeg = [1,2,3,4,5,6,7,8]
However, I would still like these sequences to be seperated if there are two zeros in-between. For example, given another list like so:
soilMoisture = [-0.16, -0.7, -0.06, -0.07, -0.04, 0.11, 0.51, -0.09, -0.02]
sequenceNeg = [1,2,3,4,5,0,0,1,2]
How do I write a script that would count the number as negative if the index directly to the left and the right are still negative, but maintain the negative sequence count if there are two non-negatives in a row?
Thanks
CodePudding user response:
Here's an approach that identifies consecutive positive values by checking the prior (shift(1)
) and subsequent (shift(-1)
) values. Once we have consecutive positives, we use those to separate the values into "groups" which from your example would be three groups, first group (g1) are the values before the consecutive positives, g2 are the consecutive positives, g3 are after (single non-consecutive positive values do not create their own groups)
[-0.16, 0.7, -0.06, -0.07, -0.04, 0.11, 0.51, -0.09, -0.02]
g1 g1 g1 g1 g1 g2 g2 g3 g3
Then use groupby
and cumcount
to make a cumsum that resets within each group. Finally set the consecutive positive groups to 0
import pandas as pd
df = pd.DataFrame({
'soilMoisture':[-0.017, 0.015, -0.15, -0.020, -0.040, 0.11, 0.005, -0.050]
})
#find consecutive positives (curr value must be pos, and either before or after value)
is_pos = df['soilMoisture'].ge(0)
consec_pos = (is_pos & is_pos.shift(1)) | (is_pos & is_pos.shift(-1))
#group values by changes in consec positives
consec_group = consec_pos.ne(consec_pos.shift(1)).cumsum()
#count consecutive negatives within each group, then set consec_pos groups to 0
#(I think this can be done w/out a groupby but I can't figure it out)
df['sequenceNeg'] = consec_pos.groupby(consec_group).cumcount().add(1)
df.loc[consec_pos,'sequenceNeg'] = 0
print(df)
Output
soilMoisture sequenceNeg
0 -0.017 1
1 0.015 2
2 -0.150 3
3 -0.020 4
4 -0.040 5
5 0.110 0
6 0.005 0
7 -0.050 1