Home > Software design >  How to split a column and assign values to different specific columns in pandas?
How to split a column and assign values to different specific columns in pandas?

Time:04-04

I have a dataframe that looks like this:

data = {'Name': ['Mat', 'Sally', 'Bob'],
     "What's Important for you?": ["Public Transport, Universities & Gyms, Shops", "Universities & Gyms", "Public Transport, Shops"]}

df = pd.DataFrame.from_dict(data)
Name What's important to you?
Mat Public Transport, Universities & Gyms, Shops
Sally Universities & Gyms
Bob Public Transport, Shops

I want to split the What's important to you? column into multiple columns, output shown below.

data = {'Name': ['Mat', 'Sally', 'Bob'],
        "impo1": ["Public Transport", " ", "Public Transport"],
        "impo2": ["Universities & Gyms", "Universities & Gyms", ""],
        "impo3": ["Shops", " ", "Shops"]}

df1 = pd.DataFrame.from_dict(data)
df1
Name impo1 impo2 impo3
Mat Public Transport Universities & Gyms Shops
Sally Universities & Gyms
Bob Public Transport Shops

CodePudding user response:

Use:

#split column by ', ' to DataFrame
df = df1["What's Important for you?"].str.split(', ', expand=True)

#get all unique values to Series
vals = df.stack().drop_duplicates().reset_index(drop=True)

#if necessary add columns in df by unique values vals and change order of values
df = (df.reindex(np.arange(len(vals)), axis=1)
        .apply(lambda x: vals.where(vals.isin(x)), axis=1)
        .add_prefix('impo'))
#add to Name column
df = df1[['Name']].join(df)
print (df)
    Name             impo0                impo1  impo2
0    Mat  Public Transport  Universities & Gyms  Shops
1  Sally               NaN  Universities & Gyms    NaN
2    Bob  Public Transport                  NaN  Shops

CodePudding user response:

You can split, explode, and pivot:

(df
 .assign(**{"What's Important for you?":
            df["What's Important for you?"].str.split(', ')})
 .explode("What's Important for you?")
 .assign(col=lambda d: d["What's Important for you?"].factorize()[0] 1)
 .pivot('Name', 'col', "What's Important for you?")
 .add_prefix('impo')
)

output:

col               impo1                impo2  impo3
Name                                               
Bob    Public Transport                  NaN  Shops
Mat    Public Transport  Universities & Gyms  Shops
Sally               NaN  Universities & Gyms    NaN

CodePudding user response:

for this you can use regex and split function to split the 1 column into 2.

  • Related