I have a dataframe which looks somthing like this:
df
name age symptoms
x 18 fever, headache, cough
y 23 cough
z 89 HTN
a 43 TNT, HTN
I want to create a new dataframe which looks something like this, depending on the symptoms column.
df2
name age symptoms fever headache cough HTN TNT
x 18 fever, headache, cough 1 1 1 0 0
y 23 cough 0 0 1 0 0
z 89 HTN 0 0 0 1 0
a 43 TNT, HTN 0 0 0 1 1
So it actually creates a new column for every new value found.
I tried it using:
split_ = df["symptoms"].str.split(expand=True)
but how do I create new columns and add the 0 and 1 value in the dataframe.
CodePudding user response:
Try pd.DataFrame.join
with str.get_dummies
with sep
:
df.join(df['symptoms'].str.get_dummies(sep=', '))
Output:
name age symptoms HTN TNT cough fever headache
0 x 18 fever, headache, cough 0 0 1 1 1
1 y 23 cough 0 0 1 0 0
2 z 89 HTN 1 0 0 0 0
3 a 43 TNT, HTN 1 1 0 0 0
pd.Series.str.get_dummies
has the sep
argument, where you could specify the separator you want for get_dummies
.
CodePudding user response:
You could use a combination of pandas.get_dummies
and join
:
df.join(pd.get_dummies(df['symptoms'].str.split(', ').explode()).groupby(level=0).max())
output:
name age symptoms HTN TNT cough fever headache
0 x 18 fever, headache, cough 0 0 1 1 1
1 y 23 cough 0 0 1 0 0
2 z 89 HTN 1 0 0 0 0
3 a 43 TNT, HTN 1 1 0 0 0
CodePudding user response:
df.join(pd.get_dummies(df['symptoms']))
name age symptoms HTN TNT, HTN cough \
0 x 18 fever, headache, cough 0 0 0
1 y 23 cough 0 0 1
2 z 89 HTN 1 0 0
3 a 43 TNT, HTN 0 1 0
fever, headache, cough
0 1
1 0
2 0
3 0
You could also crosstab
pd.crosstab([df.name, df.age], df.symptoms).reset_index()
symptoms name age HTN TNT, HTN cough fever, headache, cough
0 a 43 0 1 0 0
1 x 18 0 0 0 1
2 y 23 0 0 1 0
3 z 89 1 0 0 0