Hi I would like to map a pandas series using a string pattern
s=pd.DataFrame([['AMcU8', 10], ['AM8v', 15], ['ASw9', 14],['ASw7', 14]], columns = ['Code', 'Quantity'])
s["newcode"]=s["Code"].map({"AM.*8.*" : "AM8", "AS.*9.*" : "AS9"})
but I get this:
Code Quantity newcode
0 AMcU8 10 NaN
1 AM8v 15 NaN
2 ASw9 14 NaN
3 ASw7 14 NaN
instead of:
Code Quantity newcode
0 AMcU8 10 AM8
1 AM8v 15 AM8
2 ASw9 14 AS9
3 ASw7 14 NaN
any idea? it's fine to get a NaN when it doesn't find a match
CodePudding user response:
You can use Series.replace
with the parameter regex
set to your mapping dictionary (documentation):
s["newcode"] = s["Code"].replace(regex={"AM.*8.*":"AM8", "AS.*9.*": "AS9"})
which produces:
Code Quantity newcode
0 AMcU8 10 AM8
1 AM8v 15 AM8
2 ASw9 14 AS9
3 ASw7 14 ASw7
Note that non-matching patterns are left unchanged.
CodePudding user response:
To my knowledge there is no direct function to perform this operation.
You can do this using apply()
and re
and iterate through your mapping dictionary as follows:
mapping = {"AM.*8" : "AM8", "AS.*9" : "AS9"}
import re
def regex_mapping(x):
for k, v in mapping.items():
if re.match(k, x):
return re.sub(k, v, x)
return x
s['Code'].apply(regex_mapping)
Output:
0 AM8
1 AM8
2 AS9
3 ASw7
Name: Code, dtype: object
CodePudding user response:
As far as I know, you can't provide regex keys to Series.map()
.
However, this does what you need:
import re
import pandas as pd
s = pd.DataFrame([['AMcU8', 10], ['AM8', 15], ['ASw9', 14], ['ASw7', 14]], columns=['Code', 'Quantity'])
def regex_replace(x, map: dict = None):
for regex, replacement in map.items():
if re.match(regex, x):
return replacement
else:
return x
s["newcode"] = s["Code"].apply(regex_replace, map={"AM.*8": "AM8", "AS.*9": "AS9"})
Or if you apply this to large DataFrames frequently and want it to be a bit faster and more efficient in that case:
import re
import pandas as pd
from functools import partial
s = pd.DataFrame([['AMcU8', 10], ['AM8', 15], ['ASw9', 14], ['ASw7', 14]], columns=['Code', 'Quantity'])
def regex_replace(map: dict = None, x=None):
for regex, replacement in map.items():
if regex.match(x):
return replacement
else:
return x
mapping = partial(regex_replace, {re.compile("AM.*8"): "AM8", re.compile("AS.*9"): "AS9"})
s["newcode"] = s["Code"].apply(mapping)