I'm new with python and pandas and I'm struggling with a problem
Here is a dataset
data = {'col1': ['a','b','a','c'], 'col2': [None,None,'a',None], 'col3': [None,'a',None,'b'], 'col4': ['a',None,'b',None], 'col5': ['b','c','c',None]}
df = pd.DataFrame(data)
I need to create 3 columns based on the unique values of col1 to col4 and whenever the col1 or col2 or col3 or col4 have a value equals to the header of the new columns it should return 1 otherwise it should return 0
need a output like this
dataset output example:
data = {'col1': ['a','b','a','c'], 'col2': [None,None,'a',None], 'col3': [None,'a',None,'b'], 'col4': ['a',None,'b',None], 'col5': ['b','c','c',None], 'a':[1,1,1,0],'b':[0,1,1,1],'c':[0,1,1,1]}
df = pd.DataFrame(data)
I was able to create a new colum and set it to 1 using the code below
df['a'] = 0
df['a'] = (df['col1'] == 'a').astype(int)
but it works only with the first column, I would have to repeat it for all columns.
Is there a way to make it happens for all columns at once?
CodePudding user response:
Check with pd.get_dummies
and groupby
df = pd.concat([df,
pd.get_dummies(df,prefix='',prefix_sep='').groupby(level=0,axis=1).max()],
axis=1)
Out[377]:
col1 col2 col3 col4 col5 a b c
0 a None None a b 1 1 0
1 b None a None c 1 1 1
2 a a None b c 1 1 1
3 c None b None None 0 1 1
CodePudding user response:
pd.concat([df, pd.get_dummies(df.stack().droplevel(1)).groupby(level=0).max()], axis=1)
result:
col1 col2 col3 col4 col5 a b c
0 a None None a b 1 1 0
1 b None a None c 1 1 1
2 a a None b c 1 1 1
3 c None b None None 0 1 1