How can I remove duplicate cells of any row in pandas DataFrame?-CodePudding

I need to update a pandas DataFrame as below. Is it possible by any means? [I highly appreciate all of your time and endeavors. Sorry that my question arose confusion among you. I tried to update the question. Thanks again]

Sample1:

import pandas as pd    
#original dataframe
data = {'row_1': ['x','y','x','y'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)

#desired dataframe from data
data1 = {'row_1': ['x','y'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)

Sample 2:

import pandas as pd    
#original dataframe
data = {'row_1': ['x','y','p','x'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)

#desired dataframe from data
data1 = {'row_1': ['x','y','p'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)

CodePudding user response：

data = data.apply(lambda x: x.transpose().dropna().unique().transpose(), axis=1)

This is what you are looking for. Use dropna to get rid of NaN's and then only keep the unique elements. Apply this logic to each row of the dataframe to get the desired result.

CodePudding user response：

you can use duplicated method. checkout this link for an example on pandas' API reference

CodePudding user response：

You can just do this,

data = data.T.loc[data.T["row_1"].drop_duplicates().index, :].T

Output -

	0	1
row_1	x	y
row_2	a	b