Home > Blockchain >  Split pandas dataframe column of type string into multiple columns based on number of ','
Split pandas dataframe column of type string into multiple columns based on number of ','

Time:01-26

Let's say I have a pandas dataframe that looks like this:

import pandas as pd
data = {'name': ['Tom, Jeffrey, Henry', 'Nick, James', 'Chris', 'David, Oscar']}
df = pd.DataFrame(data)
df
    name
0   Tom, Jeffrey, Henry
1   Nick, James
2   Chris
3   David, Oscar

I know I can split the names into separate columns using the comma as separator, like so:

df[["name1", "name2", "name3"]] = df["name"].str.split(", ", expand=True)
df
    name                name1   name2   name3
0   Tom, Jeffrey, Henry Tom     Jeffrey Henry
1   Nick, James         Nick    James   None
2   Chris               Chris   None    None
3   David, Oscar        David   Oscar   None

However, if the name column would have a row that contains 4 names, like below, the code above will yield a ValueError: Columns must be same length as key

data = {'name': ['Tom, Jeffrey, Henry', 'Nick, James', 'Chris', 'David, Oscar', 'Jim, Jones, William, Oliver']}
  
# Create DataFrame
df = pd.DataFrame(data)
df
    name
0   Tom, Jeffrey, Henry
1   Nick, James
2   Chris
3   David, Oscar
4   Jim, Jones, William, Oliver

How can automatically split the name column into n-number of separate columns based on the ',' separator? The desired output would be this:

        name                          name1  name2    name3   name4
0       Tom, Jeffrey, Henry           Tom    Jeffrey  Henry   None
1       Nick, James                   Nick   James    None    None
2       Chris                         Chris  None     None    None
3       David, Oscar                  David  Oscar    None    None
4       Jim, Jones, William, Oliver   Jim    Jones    William Oliver

CodePudding user response:

Use DataFrame.join for new DataFrame with rename for new columns names:

f = lambda x: f'name{x 1}'
df = df.join(df["name"].str.split(", ", expand=True).rename(columns=f))
print (df)
                          name  name1    name2    name3   name4
0          Tom, Jeffrey, Henry    Tom  Jeffrey    Henry    None
1                  Nick, James   Nick    James     None    None
2                        Chris  Chris     None     None    None
3                 David, Oscar  David    Oscar     None    None
4  Jim, Jones, William, Oliver    Jim    Jones  William  Oliver
  • Related