I need to update a pandas DataFrame as below. Is it possible by any means? [I highly appreciate all of your time and endeavors. Sorry that my question arose confusion among you. I tried to update the question. Thanks again]
Sample1:
import pandas as pd
#original dataframe
data = {'row_1': ['x','y','x','y'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)
#desired dataframe from data
data1 = {'row_1': ['x','y'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)
Sample 2:
import pandas as pd
#original dataframe
data = {'row_1': ['x','y','p','x'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)
#desired dataframe from data
data1 = {'row_1': ['x','y','p'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)
CodePudding user response:
data = data.apply(lambda x: x.transpose().dropna().unique().transpose(), axis=1)
This is what you are looking for. Use dropna
to get rid of NaN
's and then only keep the unique
elements. Apply this logic to each row of the dataframe to get the desired result.
CodePudding user response:
you can use duplicated
method. checkout this link for an example on pandas' API reference
CodePudding user response:
You can just do this,
data = data.T.loc[data.T["row_1"].drop_duplicates().index, :].T
Output -
0 | 1 | |
---|---|---|
row_1 | x | y |
row_2 | a | b |