this is the data:
data
Positions
0 ST, LW
1 CF, RW, ST
2 LW, CAM
3 GK
4 CAM, CM
5 CB
6 ST
7 CM, CDM
8 LW, CF
9 GK
here is a another list
lst = ['CAM',
'CB',
'LB',
'LWB',
'LW',
'CF',
'ST',
'RWB',
'GK',
'RM',
'LM',
'CM',
'CDM',
'RW',
'RB']
this problem is just like one hot encoding.
for every rows,when a row has a element in the list, then fill 1, if not have, fill 0.
result`s shape:(10,15). list has 15 elements, number of data rows is 10.
blow is a demo to describe this result.
CAM CB LB LW CF
0 0 0 1 0
0 0 0 0 1
CodePudding user response:
Assuming you have a string on the column, you can get_dummies
, then reindex
:
out = (data['Positions'].str.get_dummies(sep=', ')
.reindex(lst, axis=1, fill_value=0)
)
If you have lists:
out = (pd
.get_dummies(data['Positions'].explode()
.groupby(level=0).max()
.reindex(lst, axis=1, fill_value=0)
)
Output:
CAM CB LB LWB LW CF ST RWB GK RM LM CM CDM RW RB
0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0
2 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
8 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
CodePudding user response:
import pandas as pd
positions = data = [
['ST', 'LW'],
['CF', 'RW', 'ST'],
['LW', 'CAM'],
['GK'],
['CAM', 'CM'],
['CB'],
['ST'],
['CM', 'CDM'],
['LW', 'CF'],
['GK']
]
data = pd.DataFrame({'Positions': positions})
list_ = ['CAM', 'CB', 'LB', 'LWB', 'LW', 'CF', 'ST', 'RWB', 'GK', 'RM', 'LM', 'CM', 'CDM', 'RW', 'RB']
result = pd.DataFrame(0, index=range(data.shape[0]), columns=list_)
for i, row in enumerate(data.Positions):
for e in row:
result.loc[i, e] = 1
print(result)
prints
index | CAM | CB | LB | LWB | LW | CF | ST | RWB | GK | RM | LM | CM | CDM | RW | RB |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
5 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |