Home > Software design >  Count the amount of times a boolean goes from True to False in a column grouped by year in another c
Count the amount of times a boolean goes from True to False in a column grouped by year in another c

Time:04-29

I have a pandas dataframe with a year column and a boolean column, and want to count how many times true changes to false by year.

Year Boolean
2019 True
2019 False
2019 True
2019 False
2020 True
2020 True
2020 False
2021 False
2021 True
2021 True
2021 False

My expected outcome would be an array or another df that looks like this

Year Changes
2019 2
2020 1
2021 1

I've used something like this to count the total number of True to False changes, but am unsure how to group them by year

(df.Col2 & (df.Col2 != df.Col2.shift(1))).sum()

CodePudding user response:

You need to count per group, for this you can group by year, convert the booleans to integers and count the -1 values after a diff:

df.groupby('Year')['Boolean'].apply(lambda s: s.astype(int).diff().eq(-1).sum())

variant with your initial approach:

df.groupby('Year')['Boolean'].apply(lambda s: (s&s.ne(s.shift())).sum())

output:

Year
2019    2
2020    1
2021    1
Name: Boolean, dtype: int64

CodePudding user response:

You can do:

df['Changes'] = df.groupby('Year').apply(lambda b: ~b & b.shift(1).fillna(False))

df at this point:

    Year    Boolean Changes
0   2019    True    False
1   2019    False   True
2   2019    True    False
3   2019    False   True
4   2020    True    False
5   2020    True    False
6   2020    False   True
7   2021    False   False
8   2021    True    False
9   2021    True    False
10  2021    False   True

and then group and sum:

df = df.groupby('Year',as_index=False)['Changes'].sum()

Output:

    Year    Changes
0   2019    2
1   2020    1
2   2021    1
  • Related