Home > Blockchain >  How to convert distinct values in the original columns to new columns with dummy?
How to convert distinct values in the original columns to new columns with dummy?

Time:06-20

Original dataframe:

df2 = pd.DataFrame(
    {"id":[1001,1002,1003,1004],
     "is_male":[1,0,0,1],
     "is_elder":[0,0,0,1],
     "timeline":["b1","b2","c","b1"]}
    )

It looks like:

id is_male is_elder timeline
1001 1 0 b1
1002 0 0 b2
1003 0 0 c
1004 1 1 b1

There are three distinct values in timeline(it actually has more than 10 distinct values), what I want is let the distinct values be the new columns, such that b1,b2,c.

Expected output:

id is_male is_elder timeline b1 b2 c
1001 1 0 b1 1 0 0
1002 0 0 b2 0 1 0
1003 0 0 c 0 0 1
1004 1 1 b1 1 0 0

What I have tried is to define three function, which is not a effective way if there are lots of distinct values:

def f(df):
    if df['timeline'] == 'b1':
        val = 1
    else:
        val = 0
    return val

df2['b1'] = df2.apply(f, axis=1)

Could anyone tell me smart ways to do it?

CodePudding user response:

You can use pandas.get_dummy and get your dummy dataframe from the timeline column then use pandas.concatenate and concat old dataframe and dummy dataframe to one dataframe like below:

>>> pd.concat([df2, pd.get_dummies(df2['timeline'])], axis=1)
     id  is_male  is_elder timeline  b1  b2  c
0  1001        1         0       b1   1   0  0
1  1002        0         0       b2   0   1  0
2  1003        0         0        c   0   0  1
3  1004        1         1       b1   1   0  0
  • Related