I have a pandas dataframe like below:
pd.DataFrame({'col1': ['A', 'C'],
'col2': ['A', 'B'],
'col3': ['B', 'B'],
'col4': ['A', 'C'],
'col5': ['C', 'F'],
'col6': ['D', 'D'],
'col7': ['E', 'G'],
'col8': ['E', 'H'] })
col1 col2 col3 col4 col5 col6 col7 col8
A A B A C D E E
C B B C F D G H
I need to generate another dataframe where each row is the first three unique values of each row from previous dataframe.
so this is what I need.
fea1 fea2 fea3
A B C
C B F
I spent hours and was not able to find a solution. Does anyone know how to achieve that. Thanks a lot in advance.
CodePudding user response:
In your case do unique
df = df.apply(lambda x : pd.Series(x.unique()[:3]),axis=1)
Out[96]:
0 1 2
0 A B C
1 C B F
CodePudding user response:
From a long testing queue
pd.DataFrame(df.agg(lambda x: x.unique()[:3], axis=1).to_list(), columns=['fea1' ,'fea2' , 'fea3'])
fea1 fea2 fea3
0 A B C
1 C B F
CodePudding user response:
Another option is to drop_duplicates
(works only if every row has at least 3 unique values):
out = df.apply(lambda x: x.drop_duplicates().to_numpy()[:3], axis=1, result_type='expand')
For general case:
out = pd.DataFrame(df.apply(lambda x: x.drop_duplicates().to_numpy()[:3], axis=1).tolist())
Output:
0 1 2
0 A B C
1 C B F