I have a dataframe that looks like this:
data = {'Name': ['Mat', 'Sally', 'Bob'],
"What's Important for you?": ["Public Transport, Universities & Gyms, Shops", "Universities & Gyms", "Public Transport, Shops"]}
df = pd.DataFrame.from_dict(data)
Name | What's important to you? |
---|---|
Mat | Public Transport, Universities & Gyms, Shops |
Sally | Universities & Gyms |
Bob | Public Transport, Shops |
I want to split the What's important to you?
column into multiple columns, output shown below.
data = {'Name': ['Mat', 'Sally', 'Bob'],
"impo1": ["Public Transport", " ", "Public Transport"],
"impo2": ["Universities & Gyms", "Universities & Gyms", ""],
"impo3": ["Shops", " ", "Shops"]}
df1 = pd.DataFrame.from_dict(data)
df1
Name | impo1 | impo2 | impo3 |
---|---|---|---|
Mat | Public Transport | Universities & Gyms | Shops |
Sally | Universities & Gyms | ||
Bob | Public Transport | Shops |
CodePudding user response:
Use:
#split column by ', ' to DataFrame
df = df1["What's Important for you?"].str.split(', ', expand=True)
#get all unique values to Series
vals = df.stack().drop_duplicates().reset_index(drop=True)
#if necessary add columns in df by unique values vals and change order of values
df = (df.reindex(np.arange(len(vals)), axis=1)
.apply(lambda x: vals.where(vals.isin(x)), axis=1)
.add_prefix('impo'))
#add to Name column
df = df1[['Name']].join(df)
print (df)
Name impo0 impo1 impo2
0 Mat Public Transport Universities & Gyms Shops
1 Sally NaN Universities & Gyms NaN
2 Bob Public Transport NaN Shops
CodePudding user response:
You can split
, explode
, and pivot
:
(df
.assign(**{"What's Important for you?":
df["What's Important for you?"].str.split(', ')})
.explode("What's Important for you?")
.assign(col=lambda d: d["What's Important for you?"].factorize()[0] 1)
.pivot('Name', 'col', "What's Important for you?")
.add_prefix('impo')
)
output:
col impo1 impo2 impo3
Name
Bob Public Transport NaN Shops
Mat Public Transport Universities & Gyms Shops
Sally NaN Universities & Gyms NaN
CodePudding user response:
for this you can use regex and split function to split the 1 column into 2.