I want to drop a column from the dataframe if all values are distinct and none repeat.
for example:
ID NAME VALUE1 VALUE2 VALUE3
0 1 Alpha 100 A1 ULV
1 2 Alpha 100 A1 SMU
2 3 Alpha 200 A2 UT
Column ID would get dropped since no values repeat and it would turn into this:
NAME VALUE1 VALUE2 VALUE3
0 Alpha 100 A1 ULV
1 Alpha 100 A1 SMU
2 Alpha 200 A2 UT
How could I do this?
CodePudding user response:
You can use a list comprehension to check if each column has duplicated items:
import pandas
# Recreate example dataframe
df = pandas.DataFrame({
'ID': [1,2,3],
'NAME': ['Alpha', 'Alpha', 'Alpha'],
'VALUE1': [100, 100, 200],
'VALUE2': ['A1', 'A1', 'A2'],
'VALUE3': ['ULV', 'SMU', 'UT'],
})
df = df[[col for col in df.columns if df[col].duplicated().any()]]
Output:
NAME VALUE1 VALUE2
0 Alpha 100 A1
1 Alpha 100 A1
2 Alpha 200 A2