Home > database >  How to transform a data frame by grouping by an individual and checking if a feature exists from a l
How to transform a data frame by grouping by an individual and checking if a feature exists from a l

Time:10-20

I have the following issue. I have a data frame like this:

ID feature
Person_1 18
Person_1 19
Person_1 23
Person_1 59
Person_2 11
Person_2 23
Person_2 59
Person_3 11
Person_3 18
Person_3 1001
Person_3 1239
Person_4 23
Person_4 6531
Person_4 19843
Person_4 200012
……
Person_60 ….

Each feature is in a new row. I have a list of features that I could have:

features
11
18
19
23
59
1001
1239
6531
19843
200012

I need the output to be like that:

11 18 19 23 59 1001 1239 6531 19843 200012
Person_1 0 1 1 1 1 0 0 0 0 0
Person_2 1 0 0 1 1 0 0 0 0 0
Person_3 1 1 0 0 0 1 1 0 0 0
Person_4 0 0 0 1 0 0 0 1 1 1

When each person is in a row, their features are assigned based on the list of features.

I've tried something like this, but it's not even close.

for i in pd.DataFrame[~ df.duplicated(subset=['id'])]:
  for Feature in feature_list:
    if feature_list in df['feature'].unique():
      print('1')
    else:
      print('0')

I'm a bit lost. How to approach the problem could you help me with that?

Thank you very much

CodePudding user response:

There's a number of ways you could do this. Here's one way.

Stating with

df = pd.DataFrame([
    ["Person_1", 1],
    ["Person_1", 2],
    ["Person_2", 1],
    ["Person_3", 3],
], columns=["ID", "feature"])

which looks like

         ID  feature
0  Person_1        1
1  Person_1        2
2  Person_2        1
3  Person_3        3

you should use a groupby and unstack:

df = df.groupby(["ID", "feature"]).size().unstack(fill_value=0).reset_index()

which yields

feature        ID  1  2  3
0        Person_1  1  1  0
1        Person_2  1  0  0
2        Person_3  0  0  1
  • Related