Home > Blockchain >  Map pandas series with string pattern
Map pandas series with string pattern

Time:10-23

Hi I would like to map a pandas series using a string pattern

s=pd.DataFrame([['AMcU8', 10], ['AM8v', 15], ['ASw9', 14],['ASw7', 14]], columns = ['Code', 'Quantity'])

s["newcode"]=s["Code"].map({"AM.*8.*" : "AM8", "AS.*9.*" : "AS9"})

but I get this:

   Code  Quantity newcode
0  AMcU8       10     NaN
1  AM8v        15     NaN
2  ASw9        14     NaN
3  ASw7        14     NaN

instead of:

   Code  Quantity newcode
0  AMcU8       10     AM8
1  AM8v        15     AM8
2  ASw9        14     AS9
3  ASw7        14     NaN

any idea? it's fine to get a NaN when it doesn't find a match

CodePudding user response:

You can use Series.replace with the parameter regex set to your mapping dictionary (documentation):

s["newcode"] = s["Code"].replace(regex={"AM.*8.*":"AM8", "AS.*9.*": "AS9"})

which produces:

    Code    Quantity    newcode
0   AMcU8   10          AM8
1   AM8v    15          AM8
2   ASw9    14          AS9
3   ASw7    14          ASw7

Note that non-matching patterns are left unchanged.

CodePudding user response:

To my knowledge there is no direct function to perform this operation.

You can do this using apply() and re and iterate through your mapping dictionary as follows:

mapping = {"AM.*8" : "AM8", "AS.*9" : "AS9"}
import re

def regex_mapping(x):
    for k, v in mapping.items():
        if re.match(k, x):
            return re.sub(k, v, x)
    return x

s['Code'].apply(regex_mapping)

Output:

0     AM8
1     AM8
2     AS9
3    ASw7
Name: Code, dtype: object

CodePudding user response:

As far as I know, you can't provide regex keys to Series.map().

However, this does what you need:

import re
import pandas as pd

s = pd.DataFrame([['AMcU8', 10], ['AM8', 15], ['ASw9', 14], ['ASw7', 14]], columns=['Code', 'Quantity'])


def regex_replace(x, map: dict = None):
    for regex, replacement in map.items():
        if re.match(regex, x):
            return replacement
    else:
        return x


s["newcode"] = s["Code"].apply(regex_replace, map={"AM.*8": "AM8", "AS.*9": "AS9"})

Or if you apply this to large DataFrames frequently and want it to be a bit faster and more efficient in that case:

import re
import pandas as pd
from functools import partial

s = pd.DataFrame([['AMcU8', 10], ['AM8', 15], ['ASw9', 14], ['ASw7', 14]], columns=['Code', 'Quantity'])


def regex_replace(map: dict = None, x=None):
    for regex, replacement in map.items():
        if regex.match(x):
            return replacement
    else:
        return x

mapping = partial(regex_replace, {re.compile("AM.*8"): "AM8", re.compile("AS.*9"): "AS9"})
s["newcode"] = s["Code"].apply(mapping)

  • Related