Python Sorting elements in a List in pandas Dataframe-CodePudding

My company requires me to upload data as a list with quotations around it and its not the best but it is what it is. For Example if i have data that is 2 inches and 3 inches I have to upload it as ["2 in", "3 in"].

When I try and sort the elements in the list for each row I get this: [1, 2, , ", ", [, ], o, z] where it sorts each individual letter and number

Example of the DF I am trying to sort:
d = {'col1': ['["3 oz","1 oz","2 oz"]', '["1.2 in","1 in","1.3 in"]', '["10 in","22 in","3.4 in"]']}
df = pd.DataFrame(data=d)

What I have tried:

def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9] )', key) ] 
    return sorted(data, key=alphanum_key)
df['col1'].apply(lambda x: sorted_alphanumeric((x)))

and 

from natsort import natsorted
df['col1'].apply(lambda x: natsorted(x))

and

df['col1'].apply(lambda x: sorted(x))

I am sure its something simple that I am missing after staring at this for 2 days but if you have any idea on how to solve it I would appreciate it.

CodePudding user response：

Because you have strings, you first need to split the data into chunks. Fo this remove the first 2 and last 2 characters [" ad "], then split on "," to get a list of the data.

Here is one way using apply:

from natsort import natsorted
(df['col1'].str[2:-2].str.split('","')
           .apply(lambda x: '["' '","'.join(natsorted(x)) '"]')
)

output (as a Series):

0        ["1 oz","2 oz","3 oz"]
1    ["1 in","1.2 in","1.3 in"]
2    ["3.4 in","10 in","22 in"]
Name: col1, dtype: object

For explicitness, the items are strings: '["1 oz","2 oz","3 oz"]'

NB. this is purely sorting on the number first and then on the unit as alphanumeric string, it does not take into account the meaning of the units

CodePudding user response：

from natsort import natsorted
df['col1'] = df['col1'].apply(lambda x: natsorted(eval(x)))
print(df)

                     col1
0      [1 oz, 2 oz, 3 oz]
1  [1 in, 1.2 in, 1.3 in]
2  [3.4 in, 10 in, 22 in]

CodePudding user response：

You can use eval to evaluate strings:

df['col1'].apply(lambda x: sorted(eval(x)))

However, in this way the lists are sorted in lexicographic order, so you have to write a more sophisticate function if you want them ordered by the numbers contained in them.

CodePudding user response：

You can use ast.literal_eval then sorting like below:

(Why using literal_eval and not using eval)

>>> from ast import literal_eval
>>> df['col1'] = df['col1'].apply(lambda x: sorted(literal_eval(x)))
>>> df
           col1
0      [1 oz, 2 oz, 3 oz]
1      [1 in, 1.2 in, 1.3 in]
2      [10 in, 22 in, 3.4 in]