Home > database >  Cumulative sum based on a boolean
Cumulative sum based on a boolean

Time:03-28

I have a dataframe that looks like this:

finalcheck
False
True
True
True
False
True

I want to create another column in this dataframe which gives me a cumulative sum based on finalcheck variable like so:

finalcheck position
False 1
True 2
True 3
True 4
False 1
True 2

I have tried multiple ways of achieving this but the closest I could get was by using a groupby with the cumsum function in pandas:

df['position'] = df.groupby((df.finalcheck).cumsum())

This did not give me the desired output. Could you help me identify where I am going wrong?

CodePudding user response:

It's a gap-and-island problem. Every time you encounter a False, you start a new island. Then within each island, you just number the row sequentially:

# The islands
s = df["finalcheck"].eq(False).cumsum()

# Within each island, label the rows sequentially
df['position'] = s.groupby(s).cumcount()   1

CodePudding user response:

summ = 0
for i in range(0,len(df)):

    if df["finalcheck"].iloc[i] == "True":
        summ = summ   1
        df["position"].iloc[i] = summ
    else:
        summ = 0

If you're thinking outside of groupby, you can try this

  • Related