Mark repeated id with a-b relationship in dataframe-CodePudding

I'm trying to create a relationship between repeated ID's in dataframe. For example take 91, so 91 is repeated 4 times so for first 91 entry first column row value will be updated to A and second will be updated to B then for next row of 91, first will be updated to B and second will updated to C then for next first will be C and second will be D and so on and this same relationship will be there for all duplicated ID's. For ID's that are not repeated first will marked as A.

id	first	other
11	0	0
09	0	0
91	0	0
91	0	0
91	0	0
91	0	0
15	0	0
15	0	0
12	0	0
01	0	0
01	0	0
01	0	0

Expected output:

id	first	other
11	A	0
09	A	0
91	A	B
91	B	C
91	C	D
91	D	E
15	A	B
15	B	C
12	A	0
01	A	B
01	B	C
01	C	D

I using df.iterrows() for this but that's becoming very messy code and will be slow if dataset increases is there any easy way of doing it.

CodePudding user response：

You can perform a mapping using a cumcount per group as source:

from string import ascii_uppercase

# mapping dictionary
# this is an example, you can use any mapping
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C'...}

g = df.groupby('id')
c = g.cumcount()
m = g['id'].transform('size').gt(1)

df['first'] = c.map(d)
df.loc[m, 'other'] = c[m].add(1).map(d)

Output:

    id first other
0   11     A     0
1    9     A     0
2   91     A     B
3   91     B     C
4   91     C     D
5   91     D     E
6   15     A     B
7   15     B     C
8   12     A     0
9    1     A     B
10   1     B     C
11   1     C     D

CodePudding user response：

Given:

Doing:

# Count ids per group
df['first'] = df.groupby('id').cumcount()
# convert to letters and make other col
m = df.groupby('id').filter(lambda x: len(x)>1).index
df.loc[m, 'other'] = df['first'].add(66).apply(chr)
df['first'] = df['first'].add(65).apply(chr)
# fill in missing with 0
df['other'] = df['other'].fillna(0)

Output:

    id first other
0   11     A     0
1    9     A     0
2   91     A     B
3   91     B     C
4   91     C     D
5   91     D     E
6   15     A     B
7   15     B     C
8   12     A     0
9    1     A     B
10   1     B     C
11   1     C     D