I have a dataframe that looks like this:
finalcheck |
---|
False |
True |
True |
True |
False |
True |
I want to create another column in this dataframe which gives me a cumulative sum based on finalcheck variable like so:
finalcheck | position |
---|---|
False | 1 |
True | 2 |
True | 3 |
True | 4 |
False | 1 |
True | 2 |
I have tried multiple ways of achieving this but the closest I could get was by using a groupby with the cumsum function in pandas:
df['position'] = df.groupby((df.finalcheck).cumsum())
This did not give me the desired output. Could you help me identify where I am going wrong?
CodePudding user response:
It's a gap-and-island problem. Every time you encounter a False, you start a new island. Then within each island, you just number the row sequentially:
# The islands
s = df["finalcheck"].eq(False).cumsum()
# Within each island, label the rows sequentially
df['position'] = s.groupby(s).cumcount() 1
CodePudding user response:
summ = 0
for i in range(0,len(df)):
if df["finalcheck"].iloc[i] == "True":
summ = summ 1
df["position"].iloc[i] = summ
else:
summ = 0
If you're thinking outside of groupby, you can try this