Count number of change in values in Pandas column-CodePudding

I have a df like this

name    class    date     value
Andy     A      20220101    0 
Andy     A      20220103    1
Andy     A      20220104    0 
Bob      Z      20221120    0 
Bob      Z      20221121    0
Bob      Z      20221125    0
Bob      Z      20221127    1

Column value only takes value 0 or 1. For each group (defined by 2 columns name and class), column date is order in ascending order. I am trying to calculate for each group the ratio: number of times the column value changes in value (0 -> 1 or 1 -> 0) divided by the number of dates with data.

For the above dataframe, group (Andy, A) changes 2 times among 3 days so the ratio is 2/3. Group (Bob, Z) changes 1 time among 4 days so the ratio is 1/4=0.25.

I wonder whether there is a way to do this efficiently in Pandas? Thanks

CodePudding user response：

This works

# mark the changes in value
df['changes'] = df['value'].diff().ne(0).cumsum()
# count the number of changes and the number of values for each name
aggregated = df.groupby('name').agg({'changes':'nunique', 'value':'size'})
# by construction, we counted the original, so to count the "changes", we must subtract 1
aggregated['changes'] -= 1
# find the ratio
final = aggregated['changes'] / aggregated['value']
name
Andy    0.666667
Bob     0.250000
dtype: float64

CodePudding user response：

You can try

out = (df
       .groupby(['name', 'class']).apply(lambda g: g['value'].shift().bfill().ne(g['value']).sum()/g['date'].nunique())
       .round(2)
       .to_frame('ratio')
       .reset_index())

print(out)

   name class  ratio
0  Andy     A   0.67
1   Bob     Z   0.25