I have a data frame with one column (col). I'm trying to remove duplicate records regardless of lowercase or Uppercase, for example
df = pd.DataFrame({'Col': ['Appliance Identification', 'Natural Language','Social networks',
'natural language', 'Personal robot', 'Social Networks', 'Natural language']})
output:
Col
0 Appliance Identification
1 Natural Language
2 Social networks
3 natural language
4 Personal robot
5 Social Networks
6 Natural language
Expected Output:
Col
0 Appliance Identification
1 Social networks
2 Personal robot
3 Natural language
How can this Dropping be done regardless of case-insensitively?
CodePudding user response:
You could use:
df.groupby(df['Col'].str.lower(), as_index=False, sort=False).first()
output:
Col
0 Appliance Identification
1 Natural Language
2 Social networks
3 Personal robot
CodePudding user response:
Convert values to lowercase and filter duplicates by Series.duplicated
with invert mask by ~
in boolean indexing
:
df = df[~df['Col'].str.lower().duplicated()]
print (df)
Col
0 Appliance Identification
1 Natural Language
2 Social networks
4 Personal robot