Whats the efficient way to assign groups in python?-CodePudding

I have many groups such as:

BB = ['AA', 'BB', 'CC']
BR = ['LL', 'PP', 'KK']

I have a different column:

df.Sig
'AA'
'BB'
'CC'
'LL'
'PP'
'KK'

I want to create a different column which represents the group:

df.Sig	Group
'AA'	'BB'
'BB'	'BB'
'CC'	'BB'
'LL'	'BR'
'PP'	'BR'
'KK'	'BR'

I am currently using:

df.iloc[df['Sig'].isin(BB), 'Group'] = 'BB'

This is not the most efficient, nor is it fastest, way to do this.

Is there any other one liner that I can use to achieve the same, but faster?

CodePudding user response：

You can create a look-up table and apply that:

import pandas 


mapping = {'AA': 'BB', 'BB': 'BB', 'CC': 'BB', 'LL': 'BR', 'PP': 'BR', 'KK': 'BR'}

s = pd.Series(['AA', 'BB', 'CC', 'LL', 'PP', 'KK'])
t = s.map(mapping)
print(s)
# 0    AA
# 1    BB
# 2    CC
# 3    LL
# 4    PP
# 5    KK
# dtype: object
print(t)
# 0    BB
# 1    BB
# 2    BB
# 3    BR
# 4    BR
# 5    BR
# dtype: object

Obviously, the mapping can be constructed from the lists too.

CodePudding user response：

Use a dictionary of lists as starting point, then invert it and map:

d = {'BB': ['AA', 'BB', 'CC'],
     'BR': ['LL', 'PP', 'KK']}

d2 = {v:k for k,l in d.items() for v in l}
# {'AA': 'BB', 'BB': 'BB', 'CC': 'BB', 'LL': 'BR', 'PP': 'BR', 'KK': 'BR'}

df['Group'] = df['df.Sig'].map(d2)

Or melt and merge:

d = {'BB': ['AA', 'BB', 'CC'],
     'BR': ['LL', 'PP', 'KK']}

df = df.merge(pd.DataFrame(d).melt(var_name='Group', value_name='df.Sig'))

output:

  df.Sig Group
0     AA    BB
1     BB    BB
2     CC    BB
3     LL    BR
4     PP    BR
5     KK    BR