Home > Mobile >  Dropping duplicate rows ignoring case (lowercase or Uppercase)
Dropping duplicate rows ignoring case (lowercase or Uppercase)

Time:02-17

I have a data frame with one column (col). I'm trying to remove duplicate records regardless of lowercase or Uppercase, for example

    df = pd.DataFrame({'Col': ['Appliance Identification', 'Natural Language','Social networks',
                                  'natural language', 'Personal robot', 'Social Networks', 'Natural language']})

output:

Col
0   Appliance Identification
1   Natural Language
2   Social networks
3   natural language
4   Personal robot
5   Social Networks
6   Natural language

Expected Output:

Col
0   Appliance Identification
1   Social networks
2   Personal robot
3   Natural language

How can this Dropping be done regardless of case-insensitively?

CodePudding user response:

You could use:

df.groupby(df['Col'].str.lower(), as_index=False, sort=False).first()

output:

                        Col
0  Appliance Identification
1          Natural Language
2           Social networks
3            Personal robot

CodePudding user response:

Convert values to lowercase and filter duplicates by Series.duplicated with invert mask by ~ in boolean indexing:

df = df[~df['Col'].str.lower().duplicated()]
print (df)
                        Col
0  Appliance Identification
1          Natural Language
2           Social networks
4            Personal robot
  • Related