If I have this pandas v1.3.4 dataframe
:
index col1 col2
1 ['1','2','3'] 'a'
2 ['2','4','2'] 'b'
3 ['5','2','1'] 'c'
4 ['3','2','1'] 'd'
How can I sort each value in col1
without changing the index
or any other values (col2 in this case)? For this example, if I sort from lowest to highest (assuming lexographic sorting matched the numerical sorting) I would get:
index col1 col2
1 ['1','2','3'] 'a'
2 ['2','2','4'] 'b'
3 ['1','2','5'] 'c'
4 ['1','2','3'] 'd'
I don't particularly care what sorting approach I take, I just want lists with the same items to have the same order so they are recognised as equivalent, for some downstream data visualisation.
Thanks!
Tim
CodePudding user response:
As you want to sort string representations of integers, use natsort
:
from natsort import natsorted
df['col1'] = df['col1'].apply(natsorted)
output:
index col1 col2
0 1 ['1', '2', '3'] 'a'
1 2 ['2', '2', '4'] 'b'
2 3 ['1', '2', '5'] 'c'
3 4 ['1', '2', '3'] 'd'
CodePudding user response:
You could convert your column to list with ast.literal_eval
if col1
is a string then sort it with apply
:
import ast
df.col1 = df.col1.apply(lambda x: sorted(ast.literal_eval(x)))
print(df)
Output:
col1 col2
index
1 [1, 2, 3] 'a'
2 [2, 2, 4] 'b'
3 [1, 2, 5] 'c'
4 [1, 2, 3] 'd'
CodePudding user response:
Or good old list comprehension.
df['col1'] = [sorted(i) for i in df.col1]
Example using iris
:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris['test'] = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values.tolist()
iris['test2'] = [sorted(i) for i in iris.test]
CodePudding user response:
In case you don't want to use any import (apart from pandas
, of course):
import pandas as pd
df = pd.DataFrame({'col1': [['1', '2', '20'], ['2', '10', '2'], ['30', '2', '1'], ['3', '2', '1']]})
You can sort each list numerically using:
df[['col1']].apply(lambda x: sorted(map(int,x["col1"])), axis=1)
OUTPUT
0 [1, 2, 20]
1 [2, 2, 10]
2 [1, 2, 30]
3 [1, 2, 3]
Or as strings using:
df[['col1']].apply(lambda x: sorted(map(str,x["col1"])), axis=1)
OUTPUT
0 [1, 2, 20]
1 [10, 2, 2]
2 [1, 2, 30]
3 [1, 2, 3]