Home > Software design >  Pandas: pandas count groups if column value less than previous
Pandas: pandas count groups if column value less than previous

Time:08-31

I have a large df. I need to count groups of the 'years' column if the year restarts to 1.

I have tried

df["count"] = df.groupby((df["year"]>=df["year"].shift(1)).cumsum()).cumcount() 1

but this doesn't give the expected result. Below is an example df and the expected output.

year month   alpha
1      0     1.24036
1      1     1.17023
1      0     1.24036
1      1     1.17023
1      0     1.24036
1      1     1.17023
2      0     1.24036
2      1     1.17023
2      0     1.24036
2      1     1.17023
2      0     1.24036
2      1     1.17023
1      0     1.24036
1      1     1.17023
1      0     1.24036
1      1     1.17023
1      0     1.24036
1      1     1.17023
2      0     1.24036
2      1     1.17023
2      0     1.24036
2      1     1.17023
2      0     1.24036
2      1     1.17023

Expected output:

year month   alpha     count
1      0     1.24036     1
1      1     1.17023     1
1      0     1.24036     1    
1      1     1.17023     1
1      0     1.24036     1
1      1     1.17023     1
2      0     1.24036     1
2      1     1.17023     1
2      0     1.24036     1
2      1     1.17023     1
2      0     1.24036     1
2      1     1.17023     1
1      0     1.24036     2
1      1     1.17023     2
1      0     1.24036     2 
1      1     1.17023     2
1      0     1.24036     2
1      1     1.17023     2
2      0     1.24036     2
2      1     1.17023     2
2      0     1.24036     2
2      1     1.17023     2
2      0     1.24036     2
2      1     1.17023     2

CodePudding user response:

You don't really want a groupby.cumcount, you want to enumerate the groups.

You can use:

m = df['year'].eq(1) & df['year'].shift().ne(1)
df['count'] = m.cumsum()

Or, if you want to ensure having 1 as the first count even if the year does not necessarily start with 1:

df['count'] = m.cumsum()   ~m.iloc[0]

output:

    year  month    alpha  count
0      1      0  1.24036      1
1      1      1  1.17023      1
2      1      0  1.24036      1
3      1      1  1.17023      1
4      1      0  1.24036      1
5      1      1  1.17023      1
6      2      0  1.24036      1
7      2      1  1.17023      1
8      2      0  1.24036      1
9      2      1  1.17023      1
10     2      0  1.24036      1
11     2      1  1.17023      1
12     1      0  1.24036      2
13     1      1  1.17023      2
14     1      0  1.24036      2
15     1      1  1.17023      2
16     1      0  1.24036      2
17     1      1  1.17023      2
18     2      0  1.24036      2
19     2      1  1.17023      2
20     2      0  1.24036      2
21     2      1  1.17023      2
22     2      0  1.24036      2
23     2      1  1.17023      2

CodePudding user response:

If you have only years 1 and 2, you can detect the rows where current year is less than the previous year - this will indicate a change point:

df['count'] = (df["year"] < df["year"].shift()).cumsum()   1

Example:

df = pd.DataFrame({"year": [1, 1, 2, 2, 1, 2, 1, 1, 2, 2]})
   year  count
0     1      1
1     1      1
2     2      1
3     2      1
4     1      2
5     2      2
6     1      3
7     1      3
8     2      3
9     2      3
  • Related