Home > Blockchain >  Sort index list in same way as list of pandas dataframes is sorted by length in python?
Sort index list in same way as list of pandas dataframes is sorted by length in python?

Time:06-27

Based on my question here and here I want to sort a list of pandas dataframes and based on the desired order (here len) I want to change the values of the idx variable in the same way as the values of lst are changed. Means if lst = [df1, df2, df3] and idx = [1,2,3] and the ordered list (by len) is lst_new = [df3, df1, df2], then idx_new = [3,1,2]. A small example to illustrate my problem is:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [11, 12, 13]]),
                   columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.array([[1, 2, 3], ['x', 'y', 'z']]),
                   columns=['a', 'b', 'c'])

idx = [1,2,3]


lst = []

lst.append(df1)
lst.append(df2)
lst.append(df3)


lst = sorted(lst, key=len)

test = [i for j, i in sorted(zip(lst, idx))]
print(test)

gets the error message:

ValueError: Can only compare identically-labeled DataFrame objects

CodePudding user response:

Your initial try is good, just need the right key function to the sort. Here's how it can be done.

lst = [df1, df2, df3]  # Given the list of dataframes...

# Decorate each dataframe with its initial index
# and sort.
# Use a key that takes the length of the dataframe still.

#  Input here: [(1, df1), (2, df2), (3, df3)]
#  Output here: [(3, df3), (1, df1), (2, df2)]  (or whatever is the correct order)
lst_sort = sorted(enumerate(lst, start=1), key=lambda tup: len(tup[1]))

# now split the index and dataframe lists apart again if needed
# by using a trick where it feels like we use zip in reverse
indexes, dataframes = zip(*lst_sort)

If you want more examples, see the Sorting HOWTO in the Python docs.

Note: I've used start=1 here to get 1 as the first index as in the question, but indexes in Python generally start at 0 by convention and because lists are indexed that way, so do consider using 0-based indexing if that's more convenient.

CodePudding user response:

Found some more or less complicated solution:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [11, 12, 13]]),
                   columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.array([[1, 2, 3], ['x', 'y', 'z']]),
                   columns=['a', 'b', 'c'])

idx = [1,2,3]

lst = []

lst.append(df1)
lst.append(df2)
lst.append(df3)


lst_srt = sorted(lst, key=len)

i = 0
idx_lst = []
for a in lst_srt:
    i = 0   
    for b in lst:
        i = i   1
        if a.equals(b):
            idx_lst.append(i)
            break

print(idx_lst)

print(lst_srt)

with:

[3, 1, 2]
[   a  b  c
0  1  2  3
1  x  y  z,    a  b  c
0  1  2  3
1  4  5  6
2  7  8  9,     a   b   c
0   1   2   3
1   4   5   6
2   7   8   9
3  11  12  13]
  • Related