Home > Software engineering >  Split dataframe string (when string can hold n values of that cell variable), into multiple columns
Split dataframe string (when string can hold n values of that cell variable), into multiple columns

Time:03-03

Currently working on a dataset with a lot of contact data, being Emails one of the variables.

A cell in the Emails column can have more than one email (1 to n) and they are all separated by a comma and a space.

For contacts with only two emails, the process would be quite straightforward. One can split the string and create a new column for that secondary email as follows

email_df[['Emails', 'SecondaryEmail']] = email_df['Emails'].str.split(', ', expand=True)

However this won't work with more than 2 emails. Therefore, I wonder what is the most efficient way to split the emails when the number of emails can go from 1 to n (in this case the n is limited to around 10 but that won't always be the case), into columns with only one email each (and different names each)?

CodePudding user response:

Use Series.str.splitSeries.str.rsplit with DataFrame.pop for remove column Email after processing:

df = email_df.join(email_df.pop('Emails').str.split(', ', expand=True).add_prefix('Email'))
  • Related