I am appending a column in data-frame column name = 'Name' which is a string comprising of a few different columns concatenation.
Now, I want to replace certain characters with certain values. Lets say
& -> and < -> less than
-> greater than ' -> this is an apostrophe " -> this is a double quotation
Now how can I efficiently apply this regex on entire column. Also, Can I put it in certain function as I need to apply the same in 4 other columns as well.
I tried this
df = pd.DataFrame({'A': ['bat<', 'foo>', 'bait&'],
'B': ['abc', 'bar', 'xyz']})
df.replace({'A': r'<','A':r'>','A':r'&'}, {'A': 'less than','A': 'greater than','A': 'and'}, regex=True, inplace=True)
I am expecting this
A B
0 batless than abc
1 foogreater than bar
2 baitand xyz
But this happened.
A B
0 bat< abc
1 foo> bar
2 baitand xyz
CodePudding user response:
One can use pandas.DataFrame.apply
with a custom lambda function, using pandas.Series.str.replace
as follows
regex = r'(<|>|&)'
df_new = df.apply(lambda x: x.str.replace(regex, lambda m: 'less than' if m.group(1) == '<' else 'greater than' if m.group(1) == '>' else 'and', regex=True))
[Out]:
A B
0 batless than abc
1 foogreater than bar
2 baitand xyz
CodePudding user response:
Your replacement dict has three keys named A
so all but the last is being overwritten. Use a nested dict instead to make multiple replacements to one column:
df.replace({'A': {r'<': 'less than', r'>': 'greater than', r'&': 'and'}}, regex=True, inplace=True)
CodePudding user response:
You can use a dictionary for the mapping, but it has to look like this:
mapping = {'<': 'less than', '>': 'greater than', '&': 'and'}
Then you can compile the keys into a regex and proceed similar to Gonçalo Peres's answer:
df.apply(lambda col: col.str.replace("|".join(mapping),
lambda match: mapping.get(match.group())))