Home > database >  Is there a regex pattern that can change different values based on different matches in python
Is there a regex pattern that can change different values based on different matches in python

Time:11-26

I am appending a column in data-frame column name = 'Name' which is a string comprising of a few different columns concatenation.

Now, I want to replace certain characters with certain values. Lets say

& -> and < -> less than

-> greater than ' -> this is an apostrophe " -> this is a double quotation

Now how can I efficiently apply this regex on entire column. Also, Can I put it in certain function as I need to apply the same in 4 other columns as well.

I tried this

df = pd.DataFrame({'A': ['bat<', 'foo>', 'bait&'],
                   'B': ['abc', 'bar', 'xyz']})
df.replace({'A': r'<','A':r'>','A':r'&'}, {'A': 'less than','A': 'greater than','A': 'and'}, regex=True, inplace=True)

I am expecting this

         A    B
0     batless than  abc
1     foogreater than  bar
2  baitand  xyz

But this happened.

         A    B
0     bat<  abc
1     foo>  bar
2  baitand  xyz

CodePudding user response:

One can use pandas.DataFrame.apply with a custom lambda function, using pandas.Series.str.replace as follows

regex = r'(<|>|&)'

df_new = df.apply(lambda x: x.str.replace(regex, lambda m: 'less than' if m.group(1) == '<' else 'greater than' if m.group(1) == '>' else 'and', regex=True))

[Out]:

                 A    B
0     batless than  abc
1  foogreater than  bar
2          baitand  xyz

CodePudding user response:

Your replacement dict has three keys named A so all but the last is being overwritten. Use a nested dict instead to make multiple replacements to one column:

df.replace({'A': {r'<': 'less than', r'>': 'greater than', r'&': 'and'}}, regex=True, inplace=True)

See pandas.DataFrame.replace

CodePudding user response:

You can use a dictionary for the mapping, but it has to look like this:

mapping = {'<': 'less than', '>': 'greater than', '&': 'and'}

Then you can compile the keys into a regex and proceed similar to Gonçalo Peres's answer:

df.apply(lambda col: col.str.replace("|".join(mapping), 
                                     lambda match: mapping.get(match.group())))
  • Related