I received a Pandas Dataframe I have to work with and optimize. There are a bunch of columns and my goal is to iterate over a specific column through all the rows.
I got e.g.:
Association | Year |
---|---|
National Basketball Association | 1991 |
Major League Baseball | 2001 |
And I now want to iterate over the association column and mutate all "National Basketball Association" to "NBA" and all "Major League Baseball" to "MLB" and so on.
What would be the most efficient approach for this? I tried using IFs, which felt not that efficient.
Thank you guys in advance!
CodePudding user response:
You could use a regex to automatically convert your strings to acronyms. The following regex removes all lowercase letters that follow a leading Capital:
df['Acronym'] = df['Association'].str.replace(r'(?<=\b[A-Z])([a-z] \s*)',
'', regex=True)
output:
Association Year Acronym
0 National Basketball Association 1991 NBA
1 Major League Baseball 2001 MLB
CodePudding user response:
You can try pandas.Series.str.replace
, but this requires you manually define all the pattern.
df['Association'] = df['Association'].replace({"National Basketball Association": "NBA",
"Major League Baseball": "MLB"})
print(df)
Association Year
0 NBA 1991
1 MLB 2001