Split dataframe string (when string can hold n values of that cell variable), into multiple columns-CodePudding

Currently working on a dataset with a lot of contact data, being Emails one of the variables.

A cell in the Emails column can have more than one email (1 to n) and they are all separated by a comma and a space.

For contacts with only two emails, the process would be quite straightforward. One can split the string and create a new column for that secondary email as follows

email_df[['Emails', 'SecondaryEmail']] = email_df['Emails'].str.split(', ', expand=True)

However this won't work with more than 2 emails. Therefore, I wonder what is the most efficient way to split the emails when the number of emails can go from 1 to n (in this case the n is limited to around 10 but that won't always be the case), into columns with only one email each (and different names each)?

CodePudding user response：

Use Series.str.splitSeries.str.rsplit with DataFrame.pop for remove column Email after processing:

df = email_df.join(email_df.pop('Emails').str.split(', ', expand=True).add_prefix('Email'))