How to convert distinct values in the original columns to new columns with dummy?-CodePudding

Original dataframe:

df2 = pd.DataFrame(
    {"id":[1001,1002,1003,1004],
     "is_male":[1,0,0,1],
     "is_elder":[0,0,0,1],
     "timeline":["b1","b2","c","b1"]}
    )

It looks like:

id	is_male	is_elder	timeline
1001	1	0	b1
1002	0	0	b2
1003	0	0	c
1004	1	1	b1

There are three distinct values in timeline(it actually has more than 10 distinct values), what I want is let the distinct values be the new columns, such that b1,b2,c.

Expected output:

id	is_male	is_elder	timeline	b1	b2	c
1001	1	0	b1	1	0	0
1002	0	0	b2	0	1	0
1003	0	0	c	0	0	1
1004	1	1	b1	1	0	0

What I have tried is to define three function, which is not a effective way if there are lots of distinct values:

def f(df):
    if df['timeline'] == 'b1':
        val = 1
    else:
        val = 0
    return val

df2['b1'] = df2.apply(f, axis=1)

Could anyone tell me smart ways to do it?

CodePudding user response：

You can use pandas.get_dummy and get your dummy dataframe from the timeline column then use pandas.concatenate and concat old dataframe and dummy dataframe to one dataframe like below:

>>> pd.concat([df2, pd.get_dummies(df2['timeline'])], axis=1)
     id  is_male  is_elder timeline  b1  b2  c
0  1001        1         0       b1   1   0  0
1  1002        0         0       b2   0   1  0
2  1003        0         0        c   0   0  1
3  1004        1         1       b1   1   0  0