Getting the sum of rows until a certain point-CodePudding

I would like to have some code that would add one from the row above until a new 'SCU_KEY' comes up. For example here is code and what I would like:

df = pd.DataFrame({'SCU_KEY' : [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8], 'count':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})

Expected output:


df = pd.DataFrame({'SCU_KEY' : [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8], 'count':[1, 2, 3, 1, 2, 3, 1, 1, 2, 3, 4]})

CodePudding user response：

This will do the job-

import pandas as pd
df = pd.DataFrame({'SCU_KEY' : [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8], 'count':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})

for item in set(df['SCU_KEY']):
    inc = 0
    for i in range(len(df.index)):
        if df['SCU_KEY'][i] == item:
            df['count'][i]  = inc
            inc  = 1

P.S.- As others have mentioned, it's a good practice to show your work before asking others for solution. It shows your effort which everyone appreciates and encourages to help you.

CodePudding user response：

You can try this:

import pandas as pd

df = pd.DataFrame({
    'SCU_KEY': [3, 3, 3, 5, 5, 5, 7, 8, 8, 8, 8],
    'count': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
})

s = df['SCU_KEY']
df['count'] = s.groupby(s).cumcount()   1
print(df)

It gives:

   SCU_KEY  count
0         3      1
1         3      2
2         3      3
3         5      1
4         5      2
5         5      3
6         7      1
7         8      1
8         8      2
9         8      3
10        8      4

This assumes that values of the SCU_KEY column cannot reappear once they change, or that they can reappear but then you want to continue counting them where you left off.

If, instead, each contiguous sequence of repeating values should be counted starting from 1, then you can use this instead:

s = df['SCU_KEY']
df['count'] = s.groupby((s.shift() != s).cumsum()).cumcount()   1

For the above dataframe the result will be the same as before, but you can add, say, 3 at the end of the SCU_KEY column to see the difference.