Home > OS >  Capture the first three unique values from each row in a pandas dataframe
Capture the first three unique values from each row in a pandas dataframe

Time:03-28

I have a pandas dataframe like below:

pd.DataFrame({'col1': ['A', 'C'],
              'col2': ['A', 'B'],
              'col3': ['B', 'B'],
              'col4': ['A', 'C'],
              'col5': ['C', 'F'],
              'col6': ['D', 'D'],
              'col7': ['E', 'G'],
              'col8': ['E', 'H'] })

col1    col2    col3    col4    col5    col6    col7    col8
A       A       B       A       C       D       E       E
C       B       B       C       F       D       G       H

I need to generate another dataframe where each row is the first three unique values of each row from previous dataframe.

so this is what I need.

fea1    fea2    fea3
A       B       C
C       B       F

I spent hours and was not able to find a solution. Does anyone know how to achieve that. Thanks a lot in advance.

CodePudding user response:

In your case do unique

df = df.apply(lambda x : pd.Series(x.unique()[:3]),axis=1)
Out[96]: 
   0  1  2
0  A  B  C
1  C  B  F

CodePudding user response:

From a long testing queue

pd.DataFrame(df.agg(lambda x: x.unique()[:3], axis=1).to_list(), columns=['fea1' ,'fea2' , 'fea3'])



 fea1 fea2 fea3
0    A    B    C
1    C    B    F

CodePudding user response:

Another option is to drop_duplicates (works only if every row has at least 3 unique values):

out = df.apply(lambda x: x.drop_duplicates().to_numpy()[:3], axis=1, result_type='expand')

For general case:

out = pd.DataFrame(df.apply(lambda x: x.drop_duplicates().to_numpy()[:3], axis=1).tolist())

Output:

   0  1  2
0  A  B  C
1  C  B  F
  • Related