Home > other >  Apply str title to df columns values from dictionary values
Apply str title to df columns values from dictionary values

Time:10-25

I have a dictionary that maps column names to a function name. I have wrote a function that should capitalize the values in the df column with str.title()

import pandas as pd
 
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

  Communication_Language__c firstName lastName state        country company email       industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0                   English      john    smith  ohio  united states                manufacturing       National  Residental
def capitalize (column,df_temp):
    if df_temp[column].notna():
        df_temp[column]=df[column].str.title()
    return df_temp

def required ():
    #somethin
    Pass

parsing_map={
"firstName":[capitalize,required],
"lastName":capitalize,
"state":capitalize,
"country": [capitalize,required],
"industry":capitalize,
"System_Type__c":capitalize,
"AccountType":capitalize,
"customerSegment":capitalize,
}

i wrote the below to achieve the str title but is there a way to apply it to the df columns without naming them all

def capitalize (column,df_temp):
    if df_temp[column].notna():
        df_temp[column]=df[column].str.title()
    return df_temp

What would be the best way to reference the dictionary function mapping to apply str.title() to all of the contents in the columns with a function "capitalize"?

desired output

data= [["English","John","Smith","Ohio","United States","","","Manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

  Communication_Language__c firstName lastName state        country company email       industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0                   English      John    Smith  Ohio  United States                Manufacturing       National  Residental

CodePudding user response:

Normally you would use apply for this, e.g.

cols_to_capitalize = list(parsing_map.keys())
df[cols_to_capitalize] = df[cols_to_capitalize].apply(lambda x: x.str.title())

If you want to keep your method dictionary, I would suggest that you write the methods to act on a column, not on the dataframe. Something like this:

data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

def capitalize(col):
    if col.notna().all():
        return col.str.title()
    return col

def required(col):
    # TODO do stuff
    return col

parsing_map={
    "firstName":[capitalize,required],
    "lastName":[capitalize],
    "state":[capitalize],
    "country": [capitalize,required],
    "industry":[capitalize],
    "System_Type__c":[capitalize],
    "AccountType":[capitalize],
    "customerSegment":[capitalize],
}


for col_name, fns in parsing_map.items():
    for fn in fns:
        df[col_name] = fn(df[col_name])

You could also pass in the full df into these methods if they need to access other columns, but still returning only the single column would make the design clearer.

But you should think carefully whether you really need to reinvent the .apply functionality.

CodePudding user response:

Suggestion: Create a list of columns you want to include and then use apply

cols = ['firstName', 'lastName', 'state', 'country', 'industry', 'System_Type__c', 'AccountType', 'customerSegment']
df.apply(lambda col: col.replace(np.NaN, "").str.title() if col.name in cols else col)

EDIT: Yes, but put a string instead of a reference to your function in your parsing_map

parsing_map = {
    "firstName": "capitalize",
    "lastName": "capitalize",
    "state": "capitalize",
    "country": "capitalize",
    "industry": "capitalize",
    "System_Type__c": "capitalize",
    "AccountType": "capitalize",
    "customerSegment": "capitalize",
}

df.apply(lambda col: col.replace(np.NaN, "").str.title() if parsing_map.get(col.name) == "capitalize" else col)

If you use a dict with lists as values

df.apply(lambda col: col.replace(np.NaN, "").str.title() if "capitalize" in parsing_map.get(col.name) else col)

CodePudding user response:

def capitalize(df):
    for col in df.columns:
        df[col] = df[col].str.title()
    return df
  • Related