Home > OS >  How to map and replace a pandas column with a dictionary
How to map and replace a pandas column with a dictionary

Time:12-16

I am new to programming and specially regex. I have encountered a problem mapping a dictionary items to a pandas dataframe column.

A Minimal reproducible example would be as following (my original dataset is a large one):

my csv file looks like:

id color status
1 red "this is equal to the / number 3"
2 yellow you should visit the url \n http:13/color/findings/7
67 green conver it to a new value
7 blue "this is equal to the / number 13"
8 green conver it to a new value
23 white you should visit the url \n http:13/color/findings/67

The result I would like to have is modifying the status of each element to be more generic:

id color status
1 red "this is equal to a number"
2 yellow you should visit the corresponding website
67 green conver it to a new value
7 blue "this is equal to a number"
8 green conver it to a new value
23 white you should visit the corresponding website

the method I would like to use is creating a dictionary which the keys and values are the corresponnding status comments and replace them:

my_dict = {
'"this is equal to the \/ number \d "' : '"this is equal to a number"',
'you should visit the url \\n http:\d \/color\/findings\/\d ' : 'you should visit the corresponding website',
'conver it to a new value' : 'conver it to a new value'

}

then for the first method, I tried to replace them by mapping:

    df['status'] = [next((v for k,v in my_dict.items() if k in x), float('nan')) for x in df['status'].tolist()]

which gives me only the status which is similar to the original key value : "conver it to a new value"

And also I tried:

    dictkeys_pattern = re.compile('|'.join(my_dict), re.IGNORECASE)

    status_found = df['status'].str.findall(my_dict)

    stat = []
    for i in status_found:
        for k, v in my_dict.items():
            if re.match(k, i, re.IGNORECASE):
                stat.append(v)

    else:
        stat = None

        if status_found:
            stat = []
            for i in status_found:
                for k, v in my_dict.items():
                    if re.match(k, i, re.IGNORECASE):
                        stat.append(v)

        else:
            stat = None

However, status_found is an empty Series.

Could someoe help me and show me which part I am doing wrong?

CodePudding user response:

Try this ....

import pandas as pd
dic = {'id': [1, 2, 3, 4,5,6],
       'color': ['red', 'yellow', 'green', 'blue','green','white'],
       'status': ['this is equal to the / number 3', 'you should visit the url \n http:13/color/findings/7', 'conver it to a new value',
                  'this is equal to the / number 13',' conver it to a new value','you should visit the url \n http:13/color/findings/6']}
df = pd.DataFrame(dic)
print(df)


#creating a new list (new status)

status1=['this is equal to a number', 'you should visit the corresponding website', 'conver it to a new value',
                  'this is equal to the / number 13',' conver it to a new value','you should visit the url \n http:13/color/findings/6']

df['status'] = status1 # replacing older with new one (colname intact while replacing its contents with newcol)

print('\n',df)

CodePudding user response:

I don't know how many different cases you would like to modify, but you could simply write a function that replaces the status if a regex pattern could be found. You can then create a new column and use the cleaner function with .apply() like this.

import pandas as pd
import re

# Create sample data
d = {'color': ['red', 'yellow', 'green'],
     'status': ['"this is equal to the / number 3"', 'you should visit the url \n http:13/color/findings/7',
                'conver it to a new value']}
df = pd.DataFrame(d)

# Define cleaner function
def cleaner(x):
    new_x = re.sub(r'.*( number\s?\d ).*', '"this is equal to a number"', x)

    return new_x

# Create new column with cleaner function
df['status_cleaned'] = df['status'].apply(lambda x: cleaner(x))

print(df)

Output:

status status_cleaned
0 "this is equal to the / number 3" "this is equal to a number"
1 you should visit the url \n http:13/color/find... you should visit the url \n http:13/color/find...
2 conver it to a new value conver it to a new value

Simply, add additional re.sub()-lines to the cleaner function for other cases you would like to modify.

CodePudding user response:

You can use str.replace

dataf["status"]  = (
 dataf["status"]
  .str.replace(r"http\:\d /color/findings/\d ","website", regex=True)
  .str.replace(r"\d |/","", regex=True)
)

First we replace urls to website and then remove digits and forward slash.

  • Related