I'm trying to create a relationship between repeated ID's in dataframe. For example take 91, so 91 is repeated 4 times so for first 91 entry first column row value will be updated to A and second will be updated to B then for next row of 91, first will be updated to B and second will updated to C then for next first will be C and second will be D and so on and this same relationship will be there for all duplicated ID's. For ID's that are not repeated first will marked as A.
id | first | other |
---|---|---|
11 | 0 | 0 |
09 | 0 | 0 |
91 | 0 | 0 |
91 | 0 | 0 |
91 | 0 | 0 |
91 | 0 | 0 |
15 | 0 | 0 |
15 | 0 | 0 |
12 | 0 | 0 |
01 | 0 | 0 |
01 | 0 | 0 |
01 | 0 | 0 |
Expected output:
id | first | other |
---|---|---|
11 | A | 0 |
09 | A | 0 |
91 | A | B |
91 | B | C |
91 | C | D |
91 | D | E |
15 | A | B |
15 | B | C |
12 | A | 0 |
01 | A | B |
01 | B | C |
01 | C | D |
I using df.iterrows()
for this but that's becoming very messy code and will be slow if dataset increases is there any easy way of doing it.
CodePudding user response:
You can perform a mapping using a cumcount
per group as source:
from string import ascii_uppercase
# mapping dictionary
# this is an example, you can use any mapping
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C'...}
g = df.groupby('id')
c = g.cumcount()
m = g['id'].transform('size').gt(1)
df['first'] = c.map(d)
df.loc[m, 'other'] = c[m].add(1).map(d)
Output:
id first other
0 11 A 0
1 9 A 0
2 91 A B
3 91 B C
4 91 C D
5 91 D E
6 15 A B
7 15 B C
8 12 A 0
9 1 A B
10 1 B C
11 1 C D
CodePudding user response:
Given:
id
0 12
1 9
2 91
3 91
4 91
5 91
6 15
7 15
8 12
9 1
10 1
11 1
Doing:
# Count ids per group
df['first'] = df.groupby('id').cumcount()
# convert to letters and make other col
m = df.groupby('id').filter(lambda x: len(x)>1).index
df.loc[m, 'other'] = df['first'].add(66).apply(chr)
df['first'] = df['first'].add(65).apply(chr)
# fill in missing with 0
df['other'] = df['other'].fillna(0)
Output:
id first other
0 11 A 0
1 9 A 0
2 91 A B
3 91 B C
4 91 C D
5 91 D E
6 15 A B
7 15 B C
8 12 A 0
9 1 A B
10 1 B C
11 1 C D