Original dataframe:
df2 = pd.DataFrame(
{"id":[1001,1002,1003,1004],
"is_male":[1,0,0,1],
"is_elder":[0,0,0,1],
"timeline":["b1","b2","c","b1"]}
)
It looks like:
id | is_male | is_elder | timeline |
---|---|---|---|
1001 | 1 | 0 | b1 |
1002 | 0 | 0 | b2 |
1003 | 0 | 0 | c |
1004 | 1 | 1 | b1 |
There are three distinct values in timeline
(it actually has more than 10 distinct values), what I want is let the distinct values be the new columns, such that b1,b2,c
.
Expected output:
id | is_male | is_elder | timeline | b1 | b2 | c |
---|---|---|---|---|---|---|
1001 | 1 | 0 | b1 | 1 | 0 | 0 |
1002 | 0 | 0 | b2 | 0 | 1 | 0 |
1003 | 0 | 0 | c | 0 | 0 | 1 |
1004 | 1 | 1 | b1 | 1 | 0 | 0 |
What I have tried is to define three function, which is not a effective way if there are lots of distinct values:
def f(df):
if df['timeline'] == 'b1':
val = 1
else:
val = 0
return val
df2['b1'] = df2.apply(f, axis=1)
Could anyone tell me smart ways to do it?
CodePudding user response:
You can use pandas.get_dummy
and get your dummy dataframe
from the timeline column
then use pandas.concatenate
and concat old dataframe and dummy dataframe to one dataframe like below:
>>> pd.concat([df2, pd.get_dummies(df2['timeline'])], axis=1)
id is_male is_elder timeline b1 b2 c
0 1001 1 0 b1 1 0 0
1 1002 0 0 b2 0 1 0
2 1003 0 0 c 0 0 1
3 1004 1 1 b1 1 0 0