Home > OS >  Dropping Column with all distinct values in pandas
Dropping Column with all distinct values in pandas

Time:06-16

I want to drop a column from the dataframe if all values are distinct and none repeat.

for example:

    ID  NAME    VALUE1  VALUE2  VALUE3
 0  1   Alpha   100     A1      ULV
 1  2   Alpha   100     A1      SMU
 2  3   Alpha   200     A2      UT

Column ID would get dropped since no values repeat and it would turn into this:

    NAME    VALUE1  VALUE2  VALUE3
0   Alpha   100     A1      ULV
1   Alpha   100     A1      SMU
2   Alpha   200     A2      UT

How could I do this?

CodePudding user response:

You can use a list comprehension to check if each column has duplicated items:

import pandas

# Recreate example dataframe
df = pandas.DataFrame({
    'ID': [1,2,3],
    'NAME': ['Alpha', 'Alpha', 'Alpha'],
    'VALUE1': [100, 100, 200],
    'VALUE2': ['A1', 'A1', 'A2'],
    'VALUE3': ['ULV', 'SMU', 'UT'],
})

df = df[[col for col in df.columns if df[col].duplicated().any()]]

Output:

    NAME  VALUE1 VALUE2
0  Alpha     100     A1
1  Alpha     100     A1
2  Alpha     200     A2
  • Related