Running Distinct Count in Pandas by a group-CodePudding

I have a dataframe 'df', with the following structure:

Input:

ID	Product	Price
1	P1	10
2	P1	11
3	P2	12
4	P2	12
5	P2	15

Expected Output:

ID	Product	Price	Distinct_Running_Count
1	P1	10	1
2	P1	11	2
3	P2	12	1
4	P2	12	1
5	P2	15	2

Problem:

I want to create a new column called 'Distinct_Running_Count', with the following logic:

Perform a running distinct count of a column 'Product' based on price
Some products don't have any price change, thus 'Distinct_Running_Count' will be 1
Every subsequent price change, the 'Distinct_Running_Count' will be incremented

Solutions Tried:

df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).cumcount()   1

df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).transform('nunique')

Issue:

The above solution either provides running count or the total uniques counts but not what I expect

CodePudding user response：

You can try to compare the row and next row in Price column and calculate the cumsum

df['Distinct_Running_Count'] = (df.groupby(['Product'])['Price']
                                .transform(lambda col: col.ne(col.shift().fillna(col)).cumsum().add(1)))

print(df)

   ID Product  Price  Distinct_Running_Count
0   1      P1     10                       1
1   2      P1     11                       2
2   3      P2     12                       1
3   4      P2     12                       1
4   5      P2     15                       2