I have a pandas dataframe with a year column and a boolean column, and want to count how many times true changes to false by year.
Year | Boolean |
---|---|
2019 | True |
2019 | False |
2019 | True |
2019 | False |
2020 | True |
2020 | True |
2020 | False |
2021 | False |
2021 | True |
2021 | True |
2021 | False |
My expected outcome would be an array or another df that looks like this
Year | Changes |
---|---|
2019 | 2 |
2020 | 1 |
2021 | 1 |
I've used something like this to count the total number of True to False changes, but am unsure how to group them by year
(df.Col2 & (df.Col2 != df.Col2.shift(1))).sum()
CodePudding user response:
You need to count per group, for this you can group by year, convert the booleans to integers and count the -1 values after a diff
:
df.groupby('Year')['Boolean'].apply(lambda s: s.astype(int).diff().eq(-1).sum())
variant with your initial approach:
df.groupby('Year')['Boolean'].apply(lambda s: (s&s.ne(s.shift())).sum())
output:
Year
2019 2
2020 1
2021 1
Name: Boolean, dtype: int64
CodePudding user response:
You can do:
df['Changes'] = df.groupby('Year').apply(lambda b: ~b & b.shift(1).fillna(False))
df at this point:
Year Boolean Changes
0 2019 True False
1 2019 False True
2 2019 True False
3 2019 False True
4 2020 True False
5 2020 True False
6 2020 False True
7 2021 False False
8 2021 True False
9 2021 True False
10 2021 False True
and then group and sum:
df = df.groupby('Year',as_index=False)['Changes'].sum()
Output:
Year Changes
0 2019 2
1 2020 1
2 2021 1