How can I pass additional arguments into pandas's apply function?-CodePudding

If my dataFrame's some column string values need to be normalized with the delimiter '|'. For example, the column name's values 'a/b/c' that need to be normalized to 'a|b|c'. And 'sex' columns 'M/F' needs to be normalized to 'M|F'.

columns_to_be_normalized = ['name', 'sex']
delimiters = ['/', ';', ',']      
for column in columns_to_be_normalized:
   df[column] = df[column].apply(normalize)

def normalize(column_text):
    for delimiter in delimiters:
        normalized_column_text = re.sub(delimiter, '|', text)
        if column_text != normalized_column_text:
            return normalized
     return column_text

My question is, how do I pass the variable delimiters into the normalize function so that I can use it in the regex? The reason I have to pass it as an argument is because the delimiters could change depending on some conditions.

CodePudding user response：

Define normalize with a named parameter:

def normalize(column_text, delimiters=None):
    if delimiters is None:
        delimiters = ['/'] # define the default here
    for delimiter in delimiters:
        normalized_column_text = re.sub(delimiter, '|', text)
        if column_text != normalized_column_text:
            return normalized # this should be fixed
     return column_text

Then use:

df[column] = df[column].apply(normalize, delimiters=['/', ';', ','])

Note that you don't need apply per item though. You can directly use pandas str.replace that takes care of the loop for you:

import re
delimiters = ['/', ';', ',']
regex = '|'.join(map(re.escape, delimiters))

df[columns_to_be_normalized] = (
df[columns_to_be_normalized].apply(lambda s: s.str.replace(regex, '|', regex=True))
)