I am trying to separate the values and sort them into another df with numerical values. If any values appear in the ID it will appear as 1 in the new df and 0 if the value is not present.
ID Values
0 1,2,3
1 2,5,7,8
2 4,5,10
Results
ID 1 2 3 4 5 6 7 8 9 10
0 1 1 1 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 1 0 0
2 0 0 0 1 1 0 0 0 0 1
CodePudding user response:
After get_dummies
we still need reindex to get back those missing column
s = df.join(df.pop('Values').str.get_dummies(',')).set_index('ID')
s.columns = s.columns.astype(int)
s = s.reindex(columns = range(1, s.columns.max() 1),fill_value=0)
s
Out[61]:
1 2 3 4 5 6 7 8 9 10
ID
0 1 1 1 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 1 0 0
2 0 0 0 1 1 0 0 0 0 1
CodePudding user response:
We can convert the Values
column to tuple format using eval
and then use the get_dummies
method to get the expected result:
>>> df['Values'] = df['Values'].apply(eval)
>>> pd.get_dummies(df['Values'].apply(pd.Series).stack().astype(int)).sum(level=0)
1 2 3 4 5 7 8 10
0 1 1 1 0 0 0 0 0
1 0 1 0 0 1 1 1 0
2 0 0 0 1 1 0 0 1