Home > Software design >  Mark repeated id with a-b relationship in dataframe
Mark repeated id with a-b relationship in dataframe

Time:07-07

I'm trying to create a relationship between repeated ID's in dataframe. For example take 91, so 91 is repeated 4 times so for first 91 entry first column row value will be updated to A and second will be updated to B then for next row of 91, first will be updated to B and second will updated to C then for next first will be C and second will be D and so on and this same relationship will be there for all duplicated ID's. For ID's that are not repeated first will marked as A.

id first other
11 0 0
09 0 0
91 0 0
91 0 0
91 0 0
91 0 0
15 0 0
15 0 0
12 0 0
01 0 0
01 0 0
01 0 0

Expected output:

id first other
11 A 0
09 A 0
91 A B
91 B C
91 C D
91 D E
15 A B
15 B C
12 A 0
01 A B
01 B C
01 C D

I using df.iterrows() for this but that's becoming very messy code and will be slow if dataset increases is there any easy way of doing it.

CodePudding user response:

You can perform a mapping using a cumcount per group as source:

from string import ascii_uppercase

# mapping dictionary
# this is an example, you can use any mapping
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C'...}

g = df.groupby('id')
c = g.cumcount()
m = g['id'].transform('size').gt(1)

df['first'] = c.map(d)
df.loc[m, 'other'] = c[m].add(1).map(d)

Output:

    id first other
0   11     A     0
1    9     A     0
2   91     A     B
3   91     B     C
4   91     C     D
5   91     D     E
6   15     A     B
7   15     B     C
8   12     A     0
9    1     A     B
10   1     B     C
11   1     C     D

CodePudding user response:

Given:

    id
0   12
1    9
2   91
3   91
4   91
5   91
6   15
7   15
8   12
9    1
10   1
11   1

Doing:

# Count ids per group
df['first'] = df.groupby('id').cumcount()
# convert to letters and make other col
m = df.groupby('id').filter(lambda x: len(x)>1).index
df.loc[m, 'other'] = df['first'].add(66).apply(chr)
df['first'] = df['first'].add(65).apply(chr)
# fill in missing with 0
df['other'] = df['other'].fillna(0)

Output:

    id first other
0   11     A     0
1    9     A     0
2   91     A     B
3   91     B     C
4   91     C     D
5   91     D     E
6   15     A     B
7   15     B     C
8   12     A     0
9    1     A     B
10   1     B     C
11   1     C     D
  • Related