I have many groups such as:
BB = ['AA', 'BB', 'CC']
BR = ['LL', 'PP', 'KK']
I have a different column:
df.Sig |
---|
'AA' |
'BB' |
'CC' |
'LL' |
'PP' |
'KK' |
I want to create a different column which represents the group:
df.Sig | Group |
---|---|
'AA' | 'BB' |
'BB' | 'BB' |
'CC' | 'BB' |
'LL' | 'BR' |
'PP' | 'BR' |
'KK' | 'BR' |
I am currently using:
df.iloc[df['Sig'].isin(BB), 'Group'] = 'BB'
This is not the most efficient, nor is it fastest, way to do this.
Is there any other one liner that I can use to achieve the same, but faster?
CodePudding user response:
You can create a look-up table and apply that:
import pandas
mapping = {'AA': 'BB', 'BB': 'BB', 'CC': 'BB', 'LL': 'BR', 'PP': 'BR', 'KK': 'BR'}
s = pd.Series(['AA', 'BB', 'CC', 'LL', 'PP', 'KK'])
t = s.map(mapping)
print(s)
# 0 AA
# 1 BB
# 2 CC
# 3 LL
# 4 PP
# 5 KK
# dtype: object
print(t)
# 0 BB
# 1 BB
# 2 BB
# 3 BR
# 4 BR
# 5 BR
# dtype: object
Obviously, the mapping can be constructed from the list
s too.
CodePudding user response:
Use a dictionary of lists as starting point, then invert it and map
:
d = {'BB': ['AA', 'BB', 'CC'],
'BR': ['LL', 'PP', 'KK']}
d2 = {v:k for k,l in d.items() for v in l}
# {'AA': 'BB', 'BB': 'BB', 'CC': 'BB', 'LL': 'BR', 'PP': 'BR', 'KK': 'BR'}
df['Group'] = df['df.Sig'].map(d2)
d = {'BB': ['AA', 'BB', 'CC'],
'BR': ['LL', 'PP', 'KK']}
df = df.merge(pd.DataFrame(d).melt(var_name='Group', value_name='df.Sig'))
output:
df.Sig Group
0 AA BB
1 BB BB
2 CC BB
3 LL BR
4 PP BR
5 KK BR