Home > database >  Find which column has unique values that can help distinguish the rows with Pandas
Find which column has unique values that can help distinguish the rows with Pandas

Time:03-14

I have the following dataframe, which contains 2 rows:

index  name      food   color   number   year   hobby  music
0      Lorenzo   pasta  blue     5        1995  art    jazz
1      Lorenzo   pasta  blue     3        1995  art    jazz

I want to write a code that will be able to tell me which column is the one that can distinguish between the these two rows.
For example , in this dataframe, the column "number" is the one that distinguish between the two rows.

Unti now I have done this very simply by just go over column after column using iloc and see the values.

duplicates.iloc[:,3]
>>>
0  blue
1  blue

It's important to take into account that:

  1. This should be for loop, each time I check it on new generated dataframe.
  2. There may be nore than 2 rows which I need to check
  3. There may be more than 1 column that can distinguish between the rows.

I thought that the way to check such a thing will be something like take each time one column, get the unique values and check if they are equal to each other ,similarly to this:

for n in np.arange(0,len(df.columns)):
    tmp=df.iloc[:,n]

and then I thought to compare if all the values are similar to each other on the temporal dataframe, but here I got stuck because sometimes I have many rows and also I need.

My end goal: to be able to check inside for loop to identify the column that has different values in each row of the temporaldtaframe, hence can help to distinguish between the rows.

CodePudding user response:

You can apply the duplicated method on all columns:

s = df.apply(pd.Series.duplicated).any()

s[~s].index

Output: ['number']

  • Related