Give unique keys in a pandas df with rows that satisfy condition having common key-CodePudding

I have a data frame like this:

d = {'id1':['3def','xyx1','roro','p344o','fjfje'],
     'id2':['hhyy','drdr','xyx1','xyx1','xyx1']
     ,'score':[100,90,100,90,70]
    }
df1 = pd.DataFrame(d)
df1
        id1  id2    score
0   3def    hhyy    100
1   xyx1    drdr    90
2   roro    xyx1    100
3   p344o   xyx1    90
4   fjfje   xyx1    70

My end goal is to give a unique key to each row. The problem is that i would like the set of rows where the id1 equals the id2 & have score more than 80 to have a common key (in my case index 1,2,3). So i would like my desired result to be like this:

    id1     id2     score    uid
0   3def    hhyy    100      1
1   xyx1    drdr    90       2
2   roro    xyx1    100      2
3   p344o   xyx1    90       2
4   fjfje   xyx1    70       3

I have done something but doesn't seem to do the trick:

msk = (((df1['id1'].isin(df1['id2']).shift(-1)) (df1['id2'].isin(df1['id2']).shift(-1))) &\
           (df1['score'] > 80).shift(-1))

df1['uid'] = msk.cumsum()

print(df1)
     id1      id2   score   uid
0   3def    hhyy    100     1
1   xyx1    drdr    90      2
2   roro    xyx1    100     3
3   p344o   xyx1    90      3
4   fjfje   xyx1    70      3

Any help on how to solve this would be very appreciated.

CodePudding user response：

You can use the following algorithm:

create a uid per group of id1
map those uids on the value of id2 and replace the uid if score matches the criterion
eventually fix the non sequential uid by subtracting the extra difference

# step 1
df1['uid'] = df1.groupby('id1', sort=False).ngroup().add(1)

# step 2
alt_uid = df1['id2'].map(df1.set_index('id1')['uid']).convert_dtypes()

# step 3
df1['uid'] = df1['uid'].mask(df1['score'].gt(80) & alt_uid.notna(), alt_uid)

# final uid
df1['uid'] -= df1['uid'].diff().sub(1).fillna(0).clip(lower=0)

output:

     id1   id2  score  uid
0   3def  hhyy    100    1
1   xyx1  drdr     90    2
2   roro  xyx1    100    2
3  p344o  xyx1     90    2
4  fjfje  xyx1     70    3

output with intermediary steps:

     id1   id2  score  uid  uid_step1  uid_step2  uid_step3
0   3def  hhyy    100    1          1       <NA>          1
1   xyx1  drdr     90    2          2       <NA>          2
2   roro  xyx1    100    2          3          2          2
3  p344o  xyx1     90    2          4          2          2
4  fjfje  xyx1     70    3          5          2          5

CodePudding user response：

I think its easier to create a custom function:

def func(data):
    idx, cur_id, cur_score, result = 0, None, None, []
    for x, y, z in data.values:
        if not ((cur_id == y) & (z > 80)):
            cur_id, cur_score = x, z
            idx  = 1
        result.append(idx)
    return result

df["new"] = func(df)

print (df)

     id1   id2  score  new
0   3def  hhyy    100    1
1   xyx1  drdr     90    2
2   roro  xyx1    100    2
3  p344o  xyx1     90    2
4  fjfje  xyx1     70    3