Home > Blockchain >  Assign values based on duplicated value of another column and length of the list of another column P
Assign values based on duplicated value of another column and length of the list of another column P

Time:11-19

I have a dataframe like this:

df:

         Collection                     ID
0   [{'tom': 'one'}, {'tom': 'two'}]    10
1   [{'nick': 'one'}]                   10
2   [{'julie': 'one'}]                  14

When the 'ID' column has duplicated values, for whichever entry of duplicates, the length of the list value of the column 'Collection' is greater, I want to set the value of a new column 'status' as 1, else 0.

Resultant df should look like: df:

        Collection                      ID  status
0   [{'tom': 'one'}, {'tom': 'two'}]    10  1
1   [{'nick': 'one'}]                   10  0
2   [{'julie': 'one'}]                  14  1      

I have tried to go along the np.where function which I have found closest to my problem from Stack Overflow but failing to get an alternative of df['Collection'].str.len() which will give me the length of the list.

df['status']=np.where(df["Collection"].str.len() > 1, 1, 0)

Thanks in advance.

df to dict value:

{'Collection': {0: [{'tom': 'one'}, {'tom': 'two'}],
  1: [{'nick': 'one'}],
  2: [{'julie': 'one'}]},
 'ID': {0: 10, 1: 10, 2: 14}}

CodePudding user response:

IIUC, you can do:

df.loc[df.assign(l=df['Collection'].apply(len)).groupby('ID').idxmax()['l'], 'status'] = 1
df['status'] = df['status'].fillna(0).astype(int)

In a later version of pandas, probably you need to supply numeric_only=True in idxmax() function.

output:

                         Collection  ID  status
0  [{'tom': 'one'}, {'tom': 'two'}]  10       1
1                 [{'nick': 'one'}]  10       0
2                [{'julie': 'one'}]  14       1

CodePudding user response:

A possible solution:

df['status'] = df['Collection'].map(len)

df['status'] =(df.groupby('ID', sort=False)
               .apply(lambda g: 1*g['status'].eq(max(g['status'])))
               .reset_index(drop=True))

Output:

                         Collection  ID  status
0  [{'tom': 'one'}, {'tom': 'two'}]  10       1
1                 [{'nick': 'one'}]  10       0
2                [{'julie': 'one'}]  14       1
  • Related