Currently working on a dataset with a lot of contact data, being Emails one of the variables.
A cell in the Emails column can have more than one email (1 to n) and they are all separated by a comma and a space.
For contacts with only two emails, the process would be quite straightforward. One can split the string and create a new column for that secondary email as follows
email_df[['Emails', 'SecondaryEmail']] = email_df['Emails'].str.split(', ', expand=True)
However this won't work with more than 2 emails. Therefore, I wonder what is the most efficient way to split the emails when the number of emails can go from 1 to n (in this case the n is limited to around 10 but that won't always be the case), into columns with only one email each (and different names each)?
CodePudding user response:
Use Series.str.split
Series.str.rsplit
with DataFrame.pop
for remove column Email
after processing:
df = email_df.join(email_df.pop('Emails').str.split(', ', expand=True).add_prefix('Email'))