I wanted to assign the unique id based on the value from the column. For ex. i have a table like this:
df = pd.DataFrame({'A': [0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1]}
Eventually I would like to have my output table looks like this:
A | id | |
---|---|---|
1 | 0 | 1 |
2 | 0 | 1 |
3 | 0 | 1 |
4 | 0 | 1 |
5 | 0 | 1 |
6 | 0 | 1 |
7 | 1 | 2 |
8 | 1 | 2 |
9 | 1 | 2 |
10 | 1 | 2 |
11 | 1 | 2 |
12 | 1 | 2 |
13 | 0 | 3 |
14 | 0 | 3 |
15 | 0 | 3 |
16 | 0 | 3 |
17 | 0 | 3 |
18 | 0 | 3 |
19 | 1 | 4 |
20 | 1 | 4 |
21 | 1 | 4 |
22 | 0 | 5 |
23 | 0 | 5 |
24 | 0 | 5 |
25 | 0 | 5 |
26 | 1 | 6 |
27 | 1 | 6 |
28 | 1 | 6 |
I tried data.groupby(['a'], sort=False).ngroup() 1
but its not working as what I want. Any help and guidance will be appreciated! thanks!
CodePudding user response:
diff
cumsum
:
df['id'] = df.A.diff().ne(0).cumsum()
df
A id
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 2
8 1 2
9 1 2
10 1 2
11 1 2
12 1 2
13 0 3
14 0 3
15 0 3
16 0 3
17 0 3
18 0 3
19 1 4
20 1 4
21 1 4
22 0 5
23 0 5
24 0 5
25 0 5
26 1 6
27 1 6
28 1 6
CodePudding user response:
import pdrle
df["id"] = pdrle.get_id(df["A"]) 1
df
# A id
# 0 0 1
# 1 0 1
# 2 0 1
# 3 0 1
# 4 0 1
# 5 0 1
# 6 0 1
# 7 1 2
# 8 1 2
# 9 1 2
# 10 1 2
# 11 1 2
# 12 1 2
# 13 0 3
# 14 0 3
# 15 0 3
# 16 0 3
# 17 0 3
# 18 0 3
# 19 1 4
# 20 1 4
# 21 1 4
# 22 0 5
# 23 0 5
# 24 0 5
# 25 0 5
# 26 1 6
# 27 1 6
# 28 1 6