I have a large df. I need to count groups of the 'years' column if the year restarts to 1.
I have tried
df["count"] = df.groupby((df["year"]>=df["year"].shift(1)).cumsum()).cumcount() 1
but this doesn't give the expected result. Below is an example df and the expected output.
year month alpha
1 0 1.24036
1 1 1.17023
1 0 1.24036
1 1 1.17023
1 0 1.24036
1 1 1.17023
2 0 1.24036
2 1 1.17023
2 0 1.24036
2 1 1.17023
2 0 1.24036
2 1 1.17023
1 0 1.24036
1 1 1.17023
1 0 1.24036
1 1 1.17023
1 0 1.24036
1 1 1.17023
2 0 1.24036
2 1 1.17023
2 0 1.24036
2 1 1.17023
2 0 1.24036
2 1 1.17023
Expected output:
year month alpha count
1 0 1.24036 1
1 1 1.17023 1
1 0 1.24036 1
1 1 1.17023 1
1 0 1.24036 1
1 1 1.17023 1
2 0 1.24036 1
2 1 1.17023 1
2 0 1.24036 1
2 1 1.17023 1
2 0 1.24036 1
2 1 1.17023 1
1 0 1.24036 2
1 1 1.17023 2
1 0 1.24036 2
1 1 1.17023 2
1 0 1.24036 2
1 1 1.17023 2
2 0 1.24036 2
2 1 1.17023 2
2 0 1.24036 2
2 1 1.17023 2
2 0 1.24036 2
2 1 1.17023 2
CodePudding user response:
You don't really want a groupby.cumcount
, you want to enumerate the groups.
You can use:
m = df['year'].eq(1) & df['year'].shift().ne(1)
df['count'] = m.cumsum()
Or, if you want to ensure having 1 as the first count even if the year does not necessarily start with 1:
df['count'] = m.cumsum() ~m.iloc[0]
output:
year month alpha count
0 1 0 1.24036 1
1 1 1 1.17023 1
2 1 0 1.24036 1
3 1 1 1.17023 1
4 1 0 1.24036 1
5 1 1 1.17023 1
6 2 0 1.24036 1
7 2 1 1.17023 1
8 2 0 1.24036 1
9 2 1 1.17023 1
10 2 0 1.24036 1
11 2 1 1.17023 1
12 1 0 1.24036 2
13 1 1 1.17023 2
14 1 0 1.24036 2
15 1 1 1.17023 2
16 1 0 1.24036 2
17 1 1 1.17023 2
18 2 0 1.24036 2
19 2 1 1.17023 2
20 2 0 1.24036 2
21 2 1 1.17023 2
22 2 0 1.24036 2
23 2 1 1.17023 2
CodePudding user response:
If you have only years 1 and 2, you can detect the rows where current year is less than the previous year - this will indicate a change point:
df['count'] = (df["year"] < df["year"].shift()).cumsum() 1
Example:
df = pd.DataFrame({"year": [1, 1, 2, 2, 1, 2, 1, 1, 2, 2]})
year count
0 1 1
1 1 1
2 2 1
3 2 1
4 1 2
5 2 2
6 1 3
7 1 3
8 2 3
9 2 3