Based on my question here and here I want to sort a list of pandas dataframes and based on the desired order (here len
) I want to change the values of the idx
variable in the same way as the values of lst
are changed. Means if lst = [df1, df2, df3] and idx = [1,2,3] and the ordered list (by len
) is lst_new = [df3, df1, df2]
, then idx_new = [3,1,2]
. A small example to illustrate my problem is:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [11, 12, 13]]),
columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.array([[1, 2, 3], ['x', 'y', 'z']]),
columns=['a', 'b', 'c'])
idx = [1,2,3]
lst = []
lst.append(df1)
lst.append(df2)
lst.append(df3)
lst = sorted(lst, key=len)
test = [i for j, i in sorted(zip(lst, idx))]
print(test)
gets the error message:
ValueError: Can only compare identically-labeled DataFrame objects
CodePudding user response:
Your initial try is good, just need the right key function to the sort. Here's how it can be done.
lst = [df1, df2, df3] # Given the list of dataframes...
# Decorate each dataframe with its initial index
# and sort.
# Use a key that takes the length of the dataframe still.
# Input here: [(1, df1), (2, df2), (3, df3)]
# Output here: [(3, df3), (1, df1), (2, df2)] (or whatever is the correct order)
lst_sort = sorted(enumerate(lst, start=1), key=lambda tup: len(tup[1]))
# now split the index and dataframe lists apart again if needed
# by using a trick where it feels like we use zip in reverse
indexes, dataframes = zip(*lst_sort)
If you want more examples, see the Sorting HOWTO in the Python docs.
Note: I've used start=1
here to get 1 as the first index as in the question, but indexes in Python generally start at 0 by convention and because lists are indexed that way, so do consider using 0-based indexing if that's more convenient.
CodePudding user response:
Found some more or less complicated solution:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [11, 12, 13]]),
columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.array([[1, 2, 3], ['x', 'y', 'z']]),
columns=['a', 'b', 'c'])
idx = [1,2,3]
lst = []
lst.append(df1)
lst.append(df2)
lst.append(df3)
lst_srt = sorted(lst, key=len)
i = 0
idx_lst = []
for a in lst_srt:
i = 0
for b in lst:
i = i 1
if a.equals(b):
idx_lst.append(i)
break
print(idx_lst)
print(lst_srt)
with:
[3, 1, 2]
[ a b c
0 1 2 3
1 x y z, a b c
0 1 2 3
1 4 5 6
2 7 8 9, a b c
0 1 2 3
1 4 5 6
2 7 8 9
3 11 12 13]