I have the following function that is running, but not actually doing anything to my dataframe.. Any ideas why this isn't working?
Technology is a column with values such as AT&T, HP, NaN, SAP, GORDON, etc. I am trying to apply a function to apply the Title function to each row (e.g. GORDON -> Gordon) but ignore those rows with acronyms present (e.g. AT&T not At&t, or HP instead of Hp). I also need to avoid cases where the acronym happens to exist in a larger word (e.g. Sapori Trattoria, not SAPori Trattoria)
data = [['HP', 10], ['GORDON', 15], ['AT&T', 14], [NaN, 9]]
db = pd.DataFrame(data, columns = ['Technology', 'Age'])
acronyms = {'HP', 'GE', 'TBD', 'AT&T'}
def title_case_not_acronyms(orig_str):
words = orig_str.split(" ")
words_tc = [word if word in acronyms else word.title() for word in words]
return " ".join(words)
db['Technology'] = db['Technology'].astype(str).apply(title_case_not_acronyms)
CodePudding user response:
Your function is essentially returning the same string that's been passed to it.
You need to return " ".join(words_tc)
rather than " ".join(words)
.
def title_case_not_acronyms(orig_str):
words = orig_str.split(" ")
words_tc = [word if word in acronyms else word.title() for word in words]
return " ".join(words_tc)
CodePudding user response:
You can also use:
df['Technology'] = df['Technology'].str.split(' ', expand=True).apply(lambda x: ' '.join([a if a in acronyms else a.title() for a in x.dropna()]), axis=1)