Home > Enterprise >  Defined Function not Applying to Dataframe Column
Defined Function not Applying to Dataframe Column

Time:10-16

I have the following function that is running, but not actually doing anything to my dataframe.. Any ideas why this isn't working?

Technology is a column with values such as AT&T, HP, NaN, SAP, GORDON, etc. I am trying to apply a function to apply the Title function to each row (e.g. GORDON -> Gordon) but ignore those rows with acronyms present (e.g. AT&T not At&t, or HP instead of Hp). I also need to avoid cases where the acronym happens to exist in a larger word (e.g. Sapori Trattoria, not SAPori Trattoria)

data = [['HP', 10], ['GORDON', 15], ['AT&T', 14], [NaN, 9]]
db = pd.DataFrame(data, columns = ['Technology', 'Age'])

acronyms = {'HP', 'GE', 'TBD', 'AT&T'}

def title_case_not_acronyms(orig_str):
    words = orig_str.split(" ")
    words_tc = [word if word in acronyms else word.title() for word in words]
    return " ".join(words)

db['Technology'] = db['Technology'].astype(str).apply(title_case_not_acronyms)

CodePudding user response:

Your function is essentially returning the same string that's been passed to it.

You need to return " ".join(words_tc) rather than " ".join(words).

def title_case_not_acronyms(orig_str):
    words = orig_str.split(" ")
    words_tc = [word if word in acronyms else word.title() for word in words]
    return " ".join(words_tc)

CodePudding user response:

You can also use:

df['Technology'] = df['Technology'].str.split(' ', expand=True).apply(lambda x: ' '.join([a if a in acronyms else a.title() for a in x.dropna()]), axis=1)
  • Related