Home > Mobile >  Reformatting Strings from Scraped Data in order to Satisfy Keyword Argument
Reformatting Strings from Scraped Data in order to Satisfy Keyword Argument

Time:10-30

I am working on a baseball analysis project where I web-scrape the real-time lineups for a given team, on a given date.

I am currently facing an issue with the names that I receive in the scraped dataframe -- in random cases, the player names will come in a different format and are unusable (I take the player name and pass it into a statistics function which will only work if I have the players name formatted correctly.)

Example:

     Freddie Freeman
     Ozzie Albies
     Ronald Acuna
     Austin RileyA. A.Riley 
     Dansby Swanson
     Adam Duvall
     Joc PedersonJ. J.Pederson

As you can see, most of the names are formatted normally, however, In a few cases, the player name is displayed, along with the first letter of their first name added onto their last name, followed by a period, and then their First initial and last name. If I could turn: Austin RileyA. A.Riley, into Austin Riley, then everything would work.

This is a consistent theme throughout all teams and data that I pull -- sometimes there a few players whos names are formatted in this exact way -- FirstName LastName First letter of First Name. First initial. Last Name

I am trying to figure out a way to re-format the names so that they are usable and doing so in a way that is generalized/applicable to any possible names.

CodePudding user response:

If the theme is really consistent you could do something like this:

name_list = ['Freddie Freeman',
         'Ozzie Albies',
         'Ronald Acuna',
         'Austin RileyA. A.Riley ',
         'Dansby Swanson',
         'Adam Duvall',
         'Joc PedersonJ. J.Pederson']
new_list = []
for n in name_list:
    new_list.append(n[:n.find('.')-1])
new_list

There are several methods to achieve this (also using regex which I would not reccomend). The example I have posted is the best in my opinion ( find() documentation)

  • Related