Home > database >  Match two columns in dataframe
Match two columns in dataframe

Time:05-26

I have two columns in dataframe df

  ID      Name
AXD2     SAM S
AXD2       SAM
SCA4       JIM
SCA4 JIM JONES
ASCQ      JOHN

I need the output to get a unique id and should match the first name only,

  ID  Name
AXD2 SAM S
SCA4   JIM
ASCQ  JOHN

Any suggestions?

CodePudding user response:

You can use groupby with agg and get first of Name

df.groupby(['ID']).agg(first_name=('Name', 'first')).reset_index()

CodePudding user response:

Use drop_duplicates:

out = df.drop_duplicates('ID', ignore_index=True)
print(out)

# Output
     ID   Name
0  AXD2  SAM S
1  SCA4    JIM
2  ASCQ   JOHN

CodePudding user response:

You can use cumcount() to find the first iteration name of the ID

df['RN'] = df.groupby(['ID']).cumcount()   1
df = df.loc[df['RN'] == 1]
df[['ID', 'Name']]
  • Related