Home > Mobile >  pd.dataframe - sort each list in a column of lists without changing index
pd.dataframe - sort each list in a column of lists without changing index

Time:11-30

If I have this pandas v1.3.4 dataframe:

index         col1          col2
  1      ['1','2','3']       'a'
  2      ['2','4','2']       'b'
  3      ['5','2','1']       'c'
  4      ['3','2','1']       'd'

How can I sort each value in col1 without changing the index or any other values (col2 in this case)? For this example, if I sort from lowest to highest (assuming lexographic sorting matched the numerical sorting) I would get:

index         col1          col2
  1      ['1','2','3']       'a'
  2      ['2','2','4']       'b'
  3      ['1','2','5']       'c'
  4      ['1','2','3']       'd'

I don't particularly care what sorting approach I take, I just want lists with the same items to have the same order so they are recognised as equivalent, for some downstream data visualisation.

Thanks!

Tim

CodePudding user response:

As you want to sort string representations of integers, use natsort:

from natsort import natsorted
df['col1'] = df['col1'].apply(natsorted)

output:

   index             col1 col2
0      1  ['1', '2', '3']  'a'
1      2  ['2', '2', '4']  'b'
2      3  ['1', '2', '5']  'c'
3      4  ['1', '2', '3']  'd'

CodePudding user response:

You could convert your column to list with ast.literal_eval if col1 is a string then sort it with apply:

import ast
df.col1 = df.col1.apply(lambda x: sorted(ast.literal_eval(x)))
print(df)

Output:

            col1 col2
index
1      [1, 2, 3]  'a'
2      [2, 2, 4]  'b'
3      [1, 2, 5]  'c'
4      [1, 2, 3]  'd'

CodePudding user response:

Or good old list comprehension.

df['col1'] = [sorted(i) for i in df.col1]

Example using iris:

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris['test'] = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values.tolist()
iris['test2'] = [sorted(i) for i in iris.test]

CodePudding user response:

In case you don't want to use any import (apart from pandas, of course):

import pandas as pd
df = pd.DataFrame({'col1': [['1', '2', '20'], ['2', '10', '2'], ['30', '2', '1'], ['3', '2', '1']]})

You can sort each list numerically using:

df[['col1']].apply(lambda x: sorted(map(int,x["col1"])), axis=1)

OUTPUT

0    [1, 2, 20]
1    [2, 2, 10]
2    [1, 2, 30]
3     [1, 2, 3]

Or as strings using:

df[['col1']].apply(lambda x: sorted(map(str,x["col1"])), axis=1)

OUTPUT

0    [1, 2, 20]
1    [10, 2, 2]
2    [1, 2, 30]
3     [1, 2, 3]
  • Related