My company requires me to upload data as a list with quotations around it and its not the best but it is what it is. For Example if i have data that is 2 inches and 3 inches I have to upload it as ["2 in", "3 in"].
When I try and sort the elements in the list for each row I get this: [1, 2, , ", ", [, ], o, z] where it sorts each individual letter and number
Example of the DF I am trying to sort:
d = {'col1': ['["3 oz","1 oz","2 oz"]', '["1.2 in","1 in","1.3 in"]', '["10 in","22 in","3.4 in"]']}
df = pd.DataFrame(data=d)
What I have tried:
def sorted_alphanumeric(data):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9] )', key) ]
return sorted(data, key=alphanum_key)
df['col1'].apply(lambda x: sorted_alphanumeric((x)))
and
from natsort import natsorted
df['col1'].apply(lambda x: natsorted(x))
and
df['col1'].apply(lambda x: sorted(x))
I am sure its something simple that I am missing after staring at this for 2 days but if you have any idea on how to solve it I would appreciate it.
CodePudding user response:
Because you have strings, you first need to split the data into chunks. Fo this remove the first 2 and last 2 characters ["
ad "]
, then split on ","
to get a list of the data.
Here is one way using apply:
from natsort import natsorted
(df['col1'].str[2:-2].str.split('","')
.apply(lambda x: '["' '","'.join(natsorted(x)) '"]')
)
output (as a Series):
0 ["1 oz","2 oz","3 oz"]
1 ["1 in","1.2 in","1.3 in"]
2 ["3.4 in","10 in","22 in"]
Name: col1, dtype: object
For explicitness, the items are strings: '["1 oz","2 oz","3 oz"]'
NB. this is purely sorting on the number first and then on the unit as alphanumeric string, it does not take into account the meaning of the units
CodePudding user response:
from natsort import natsorted
df['col1'] = df['col1'].apply(lambda x: natsorted(eval(x)))
print(df)
col1
0 [1 oz, 2 oz, 3 oz]
1 [1 in, 1.2 in, 1.3 in]
2 [3.4 in, 10 in, 22 in]
CodePudding user response:
You can use eval
to evaluate strings:
df['col1'].apply(lambda x: sorted(eval(x)))
However, in this way the lists are sorted in lexicographic order, so you have to write a more sophisticate function if you want them ordered by the numbers contained in them.
CodePudding user response:
You can use ast.literal_eval
then sorting like below:
(Why using literal_eval and not using eval)
>>> from ast import literal_eval
>>> df['col1'] = df['col1'].apply(lambda x: sorted(literal_eval(x)))
>>> df
col1
0 [1 oz, 2 oz, 3 oz]
1 [1 in, 1.2 in, 1.3 in]
2 [10 in, 22 in, 3.4 in]