Home > database >  Sort Words in Pandas Column list
Sort Words in Pandas Column list

Time:05-27

Below is the DF

df = pd.DataFrame({'cd1' : ['PFE1', 'PFE25', np.nan, np.nan], 
                   'cd2' : [np.nan, 'PFE28', 'PFE23', 'PFE14'], 
                   'cd3' : ['PFE15', 'PFE2', 'PFE83', np.nan], 
                   'cd4' : ['PFE25', np.nan, 'PFE39', 'PFE47'], 
                   'cd5' : [np.nan, 'PFE21', 'PFE53', 'PFE15']})
df


cd1   cd2    cd3    cd4     cd5
PFE1  NaN    PFE15  PFE25   NaN
PFE25 PFE28  PFE2   NaN     PFE21
NaN   PFE23  PFE83  PFE39   PFE53
NaN   PFE14  NaN    PFE47   PFE15

There are multiples task that I'm trying to do (get some helps from previous stack questions thanks for that!)

Combine Multiple Cols & Remove Duplicates Values (not in this eg)

df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1)
df['Codes'] = list(map(set, df['combined']))

cd1   cd2   cd3   cd4   cd5     combined                       Codes
PFE1  NaN   PFE15 PFE25 NaN     [PFE1, PFE15, PFE25]           {PFE25, PFE1, PFE15}
PFE25 PFE28 PFE2  NaN   PFE21   [PFE25, PFE28, PFE2, PFE21]    {PFE28, PFE21, PFE25, PFE2}
NaN   PFE23 PFE83 PFE39 PFE53   [PFE23, PFE83, PFE39, PFE53]   {PFE83, PFE23, PFE39, PFE53}
NaN   PFE14 NaN   PFE47 PFE15   [PFE14, PFE47, PFE15]          {PFE14, PFE47, PFE15}  

The aim is to sort words Below is the expected output

Output_col
PFE1,  PFE15, PFE25
PFE2,  PFE21, PFE25, PFE28
PFE23, PFE29, PFE53, PFE83
PFE14, PFE15, PFE47

I tried to sort after agg not working

df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1).sort_values()

Also tried to sort directly the column but not working

df['combined'] = df['combined'].sort_values()

So if anyone has some clues thanks for your help!

CodePudding user response:

I think this is doing what you want?

Need to add a sort into the lambda function so the list itself is being sorted not the column at the end

Not sure if there's a neater way to avoid making a function, but the list.sort() function doesn't return a new list, it modifies the existing one

def sort_list(my_list:list)->list:
    temp_list = my_list.copy()
    temp_list.sort()
    return temp_list

df.agg(lambda x: sort_list(list(x.dropna())), axis=1)

make output

0            [PFE1, PFE15, PFE25]
1     [PFE2, PFE21, PFE25, PFE28]
2    [PFE23, PFE39, PFE53, PFE83]
3           [PFE14, PFE15, PFE47]

CodePudding user response:

The function sort_values() works on sorting the pandas seires/dataframe based on the records you have along the "sorted by" column.

If you need to sort the values in the lists that are records in the column then you have to specify a function that iterates over the records.

df['combined'] = df['combined'].apply(lambda x: sorted(x))
  • Related