Home > Back-end >  How to perform certain logics if a column matches regular expression in pandas dataframe?
How to perform certain logics if a column matches regular expression in pandas dataframe?

Time:07-09

I am struggling with mixed data type provided to me in Excel by my team members. The column contains text, whole number and decimals. This is pandas dataframe and contains more columns too. Below is the example.

Column
Does not apply
2
5
0.07
0.45
7% offset

I want to create a logic where if a row is whole number then add $ sign before it. If a row is less than 1 then multiply it by 100 and add %, if a cell has digits then also multiply by 100 and add %, and else if it has text then do nothing. I tried a solution which kind of helped writing the basic regular expression to grab the positive integers but it does not work completely.

This is how it should look like.

Column
Does not apply
$2
$5
70%
45%
7% offset

Additional Comment I apologize but I had to add another logic here. If the cell contains text anywhere either in the beginning or middle then do nothing.

CodePudding user response:

my regex suggestion for separation numbers is:

^[0-9]$ for int numbers

and

[0-9] [.][0-9] for decimal numbers.

CodePudding user response:

There are other ways than regex to do this, but this is a regex-y way.

import re

def modify_value(x):
    # Doesn't start with a number
    if re.match('\D', x): 
        return x
    # Starts with number leading with %
    elif re.match('\d [%]',x):
        return x
    # Starts wtih 0 then add %
    elif re.match('[0]',x):
        return x   '%'
    # Starts with number and has decimal
    elif re.match('\d \.', x):
        return f'{float(x) * 100:.0f}%'
    # Starts with number
    elif re.match('\d', x):
        return '$'   x

df = df.applymap(modify_value)
print(df)

# Output:

           Column
0  Does not apply
1              $2
2              $5
3              7%
4             45%
5              0%
6       6%-offset
  • Related