I have a following series and would like to replace a1
with a
, b1
with b
and c1
with c
.
data = pd.Series([['a1', 'b1', 'c1'], ['b1', 'a1', 'c1'], ['c1', 'a1' ,'b1']])
Out[132]:
0 [a1, b1, c1]
1 [b1, a1, c1]
2 [c1, a1, b1]
dtype: object
The expected results is as below.
0 [a, b, c]
1 [b, a, c]
2 [c, a, b]
dtype: object
The following code does what I am trying to do, but it does not seem to be a nice way to do.
for i, s in enumerate(data):
temp = ['a' if x == 'a1' else x for x in s]
temp = ['b' if x == 'b1' else x for x in temp]
temp = ['c' if x == 'c1' else x for x in temp]
data.iloc[i] = temp
Is there a better way of doing this? I assume pandas have a built-in function for this.
I tried it with replace
, but it does not help.
data.replace['a1', 'a']
data.replace['b1', 'c']
data.replace['c1', 'c']
Thank you for any comment in advance.
CodePudding user response:
Create dictionary for replace and use list comprehension with get
- second parameter y
is if not exist key get original:
d = {'a1':'a', 'b1':'b', 'c1':'c'}
data = data.apply(lambda x: [d.get(y,y) for y in x])
#alternative solution
#data = data.map(lambda x: [d.get(y,y) for y in x])
print (data)
0 [a, b, c]
1 [b, a, c]
2 [c, a, b]
dtype: object
Or:
data = pd.Series([[d.get(y,y) for y in x] for x in data], index=data.index)
If performance not important:
data = data.explode().replace(d).groupby(level=0).agg(list)
print (data)
0 [a, b, c]
1 [b, a, c]
2 [c, a, b]
dtype: object