I have a df like this
name class date value
Andy A 20220101 0
Andy A 20220103 1
Andy A 20220104 0
Bob Z 20221120 0
Bob Z 20221121 0
Bob Z 20221125 0
Bob Z 20221127 1
Column value
only takes value 0 or 1. For each group (defined by 2 columns name
and class
), column date
is order in ascending order. I am trying to calculate for each group the ratio: number of times the column value
changes in value (0 -> 1 or 1 -> 0) divided by the number of dates with data.
For the above dataframe, group (Andy, A) changes 2 times among 3 days so the ratio is 2/3. Group (Bob, Z) changes 1 time among 4 days so the ratio is 1/4=0.25.
I wonder whether there is a way to do this efficiently in Pandas? Thanks
CodePudding user response:
This works
# mark the changes in value
df['changes'] = df['value'].diff().ne(0).cumsum()
# count the number of changes and the number of values for each name
aggregated = df.groupby('name').agg({'changes':'nunique', 'value':'size'})
# by construction, we counted the original, so to count the "changes", we must subtract 1
aggregated['changes'] -= 1
# find the ratio
final = aggregated['changes'] / aggregated['value']
name
Andy 0.666667
Bob 0.250000
dtype: float64
CodePudding user response:
You can try
out = (df
.groupby(['name', 'class']).apply(lambda g: g['value'].shift().bfill().ne(g['value']).sum()/g['date'].nunique())
.round(2)
.to_frame('ratio')
.reset_index())
print(out)
name class ratio
0 Andy A 0.67
1 Bob Z 0.25