I have the following issue. I have a data frame like this:
ID | feature |
---|---|
Person_1 | 18 |
Person_1 | 19 |
Person_1 | 23 |
Person_1 | 59 |
Person_2 | 11 |
Person_2 | 23 |
Person_2 | 59 |
Person_3 | 11 |
Person_3 | 18 |
Person_3 | 1001 |
Person_3 | 1239 |
Person_4 | 23 |
Person_4 | 6531 |
Person_4 | 19843 |
Person_4 | 200012 |
…… | |
Person_60 | …. |
Each feature is in a new row. I have a list of features that I could have:
features |
---|
11 |
18 |
19 |
23 |
59 |
1001 |
1239 |
6531 |
19843 |
200012 |
I need the output to be like that:
11 | 18 | 19 | 23 | 59 | 1001 | 1239 | 6531 | 19843 | 200012 | |
---|---|---|---|---|---|---|---|---|---|---|
Person_1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Person_2 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Person_3 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
Person_4 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 |
When each person is in a row, their features are assigned based on the list of features.
I've tried something like this, but it's not even close.
for i in pd.DataFrame[~ df.duplicated(subset=['id'])]:
for Feature in feature_list:
if feature_list in df['feature'].unique():
print('1')
else:
print('0')
I'm a bit lost. How to approach the problem could you help me with that?
Thank you very much
CodePudding user response:
There's a number of ways you could do this. Here's one way.
Stating with
df = pd.DataFrame([
["Person_1", 1],
["Person_1", 2],
["Person_2", 1],
["Person_3", 3],
], columns=["ID", "feature"])
which looks like
ID feature
0 Person_1 1
1 Person_1 2
2 Person_2 1
3 Person_3 3
you should use a groupby
and unstack
:
df = df.groupby(["ID", "feature"]).size().unstack(fill_value=0).reset_index()
which yields
feature ID 1 2 3
0 Person_1 1 1 0
1 Person_2 1 0 0
2 Person_3 0 0 1