I have a dictionary that maps column names to a function name. I have wrote a function that should capitalize the values in the df column with str.title()
import pandas as pd
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
Communication_Language__c firstName lastName state country company email industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0 English john smith ohio united states manufacturing National Residental
def capitalize (column,df_temp):
if df_temp[column].notna():
df_temp[column]=df[column].str.title()
return df_temp
def required ():
#somethin
Pass
parsing_map={
"firstName":[capitalize,required],
"lastName":capitalize,
"state":capitalize,
"country": [capitalize,required],
"industry":capitalize,
"System_Type__c":capitalize,
"AccountType":capitalize,
"customerSegment":capitalize,
}
i wrote the below to achieve the str title but is there a way to apply it to the df columns without naming them all
def capitalize (column,df_temp):
if df_temp[column].notna():
df_temp[column]=df[column].str.title()
return df_temp
What would be the best way to reference the dictionary function mapping to apply str.title()
to all of the contents in the columns with a function "capitalize"?
desired output
data= [["English","John","Smith","Ohio","United States","","","Manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
Communication_Language__c firstName lastName state country company email industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0 English John Smith Ohio United States Manufacturing National Residental
CodePudding user response:
Normally you would use apply for this, e.g.
cols_to_capitalize = list(parsing_map.keys())
df[cols_to_capitalize] = df[cols_to_capitalize].apply(lambda x: x.str.title())
If you want to keep your method dictionary, I would suggest that you write the methods to act on a column, not on the dataframe. Something like this:
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
def capitalize(col):
if col.notna().all():
return col.str.title()
return col
def required(col):
# TODO do stuff
return col
parsing_map={
"firstName":[capitalize,required],
"lastName":[capitalize],
"state":[capitalize],
"country": [capitalize,required],
"industry":[capitalize],
"System_Type__c":[capitalize],
"AccountType":[capitalize],
"customerSegment":[capitalize],
}
for col_name, fns in parsing_map.items():
for fn in fns:
df[col_name] = fn(df[col_name])
You could also pass in the full df into these methods if they need to access other columns, but still returning only the single column would make the design clearer.
But you should think carefully whether you really need to reinvent the .apply
functionality.
CodePudding user response:
Suggestion: Create a list of columns you want to include and then use apply
cols = ['firstName', 'lastName', 'state', 'country', 'industry', 'System_Type__c', 'AccountType', 'customerSegment']
df.apply(lambda col: col.replace(np.NaN, "").str.title() if col.name in cols else col)
EDIT: Yes, but put a string instead of a reference to your function in your parsing_map
parsing_map = {
"firstName": "capitalize",
"lastName": "capitalize",
"state": "capitalize",
"country": "capitalize",
"industry": "capitalize",
"System_Type__c": "capitalize",
"AccountType": "capitalize",
"customerSegment": "capitalize",
}
df.apply(lambda col: col.replace(np.NaN, "").str.title() if parsing_map.get(col.name) == "capitalize" else col)
If you use a dict with lists as values
df.apply(lambda col: col.replace(np.NaN, "").str.title() if "capitalize" in parsing_map.get(col.name) else col)
CodePudding user response:
def capitalize(df):
for col in df.columns:
df[col] = df[col].str.title()
return df