Home > Software design >  mask linked data across two columns
mask linked data across two columns

Time:11-10

This is an subset of huge dataset, I want that all the numbers in ID-2 columns linked directly or indirectly to 1 in ID-1 to be grouped next to 1, same with 2 and so on.

   ID1  ID2
0    1    2
1    1    4
2    2    6
3    2    5
4    3    7

to:

  ID1  ID2
0   1    2
1        4
2        6
3        5
4   3    7

CodePudding user response:

You can use boolean indexing:

# map ID2 values to ID1 or keep ID1
s = df['ID1'].map(df.set_index('ID2')['ID1']).fillna(df['ID1'])

# mask duplicated values
df.loc[s.duplicated(), 'ID1'] = ''

output:

  ID1  ID2
0   1    2
1        4
2        6
3        5
4   3    7

If you prefer a new column:

s = df['ID1'].map(df.set_index('ID2')['ID1']).fillna(df['ID1'])

df['new'] = df['ID1'].mask(s.duplicated(), '')

output:

   ID1  ID2 new
0    1    2   1
1    1    4    
2    2    6    
3    2    5    
4    3    7   3
  • Related