Below is the DF
df = pd.DataFrame({'cd1' : ['PFE1', 'PFE25', np.nan, np.nan],
'cd2' : [np.nan, 'PFE28', 'PFE23', 'PFE14'],
'cd3' : ['PFE15', 'PFE2', 'PFE83', np.nan],
'cd4' : ['PFE25', np.nan, 'PFE39', 'PFE47'],
'cd5' : [np.nan, 'PFE21', 'PFE53', 'PFE15']})
df
cd1 cd2 cd3 cd4 cd5
PFE1 NaN PFE15 PFE25 NaN
PFE25 PFE28 PFE2 NaN PFE21
NaN PFE23 PFE83 PFE39 PFE53
NaN PFE14 NaN PFE47 PFE15
There are multiples task that I'm trying to do (get some helps from previous stack questions thanks for that!)
Combine Multiple Cols & Remove Duplicates Values (not in this eg)
df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1)
df['Codes'] = list(map(set, df['combined']))
cd1 cd2 cd3 cd4 cd5 combined Codes
PFE1 NaN PFE15 PFE25 NaN [PFE1, PFE15, PFE25] {PFE25, PFE1, PFE15}
PFE25 PFE28 PFE2 NaN PFE21 [PFE25, PFE28, PFE2, PFE21] {PFE28, PFE21, PFE25, PFE2}
NaN PFE23 PFE83 PFE39 PFE53 [PFE23, PFE83, PFE39, PFE53] {PFE83, PFE23, PFE39, PFE53}
NaN PFE14 NaN PFE47 PFE15 [PFE14, PFE47, PFE15] {PFE14, PFE47, PFE15}
The aim is to sort words Below is the expected output
Output_col
PFE1, PFE15, PFE25
PFE2, PFE21, PFE25, PFE28
PFE23, PFE29, PFE53, PFE83
PFE14, PFE15, PFE47
I tried to sort after agg not working
df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1).sort_values()
Also tried to sort directly the column but not working
df['combined'] = df['combined'].sort_values()
So if anyone has some clues thanks for your help!
CodePudding user response:
I think this is doing what you want?
Need to add a sort into the lambda function so the list itself is being sorted not the column at the end
Not sure if there's a neater way to avoid making a function, but the list.sort() function doesn't return a new list, it modifies the existing one
def sort_list(my_list:list)->list:
temp_list = my_list.copy()
temp_list.sort()
return temp_list
df.agg(lambda x: sort_list(list(x.dropna())), axis=1)
make output
0 [PFE1, PFE15, PFE25]
1 [PFE2, PFE21, PFE25, PFE28]
2 [PFE23, PFE39, PFE53, PFE83]
3 [PFE14, PFE15, PFE47]
CodePudding user response:
The function sort_values() works on sorting the pandas seires/dataframe based on the records you have along the "sorted by" column.
If you need to sort the values in the lists that are records in the column then you have to specify a function that iterates over the records.
df['combined'] = df['combined'].apply(lambda x: sorted(x))