Home > Mobile >  Sort a subset of Pandas DataFrame
Sort a subset of Pandas DataFrame

Time:09-01

import pandas as pd
data = [[1, 1, 2, 1, 0], [ 2, 2, 2, 1, 4], [ 3, 1, 0, 1,4], [ 4, 1, 3, 1, 4], 
        [5, 1, 6, 1, 4], [ 6, 1, 2, 0, 4], [ 7, 1, 2, 7,4], [ 8, 1, 2, 1, 1], 
        [9, 1, 2, 1, 2], [10, 1, 2, 1, 3], [11, 1, 2, 1,5], [12, 1, 2, 1, 6]]
df = pd.DataFrame(data, columns=['Id','c1', 'c2','c3', 'c4'])

import scipy.integrate
import scipy.special
mat = scipy.spatial.distance.cdist(
    df[['c1','c2','c3','c4']], 
    df[['c1','c2','c3','c4']], 
    metric='euclidean'
)
new_df = pd.DataFrame(mat, index=df['Id'], columns=df['Id']) 

When I apply sorting in dataframe, it works:

new_df.sort_values(by=1,ascending=True,kind="mergesort",axis=1)

but if I apply sorting in a subset of dataframe it does not work:

i = 1
j = 2
new_dff = new_df[i:j]
new_dff.sort_values(by=1, ascending=True, kind="mergesort", axis=1)

CodePudding user response:

For subset of rows use DataFrame.loc:

i = 1
j = 2

new_dff=new_df.loc[i:j]
print (new_dff)
Id        1         2         3         4         5         6         7   \
Id                                                                         
1   0.000000  4.123106  4.472136  4.123106  5.656854  4.123106  7.211103   
2   4.123106  0.000000  2.236068  1.414214  4.123106  1.414214  6.082763   

Id        8         9         10        11        12  
Id                                                    
1   1.000000  2.000000  3.000000  5.000000  6.000000  
2   3.162278  2.236068  1.414214  1.414214  2.236068

Then sorting working well:

new_dff = new_dff.sort_values(by=1, ascending=True, kind="mergesort", axis=1)
print (new_dff)
Id        1         8         9         10        2         4         6   \
Id                                                                         
1   0.000000  1.000000  2.000000  3.000000  4.123106  4.123106  4.123106   
2   4.123106  3.162278  2.236068  1.414214  0.000000  1.414214  1.414214   

Id        3         11        5         12        7   
Id                                                    
1   4.472136  5.000000  5.656854  6.000000  7.211103  
2   2.236068  1.414214  4.123106  2.236068  6.082763  

Or for subset of columns use : for select all rows:

i = 1
j = 2

new_dff=new_df.loc[:, i:j]
print (new_dff)
Id         1         2
Id                    
1   0.000000  4.123106
2   4.123106  0.000000
3   4.472136  2.236068
4   4.123106  1.414214
5   5.656854  4.123106
6   4.123106  1.414214
7   7.211103  6.082763
8   1.000000  3.162278
9   2.000000  2.236068
10  3.000000  1.414214
11  5.000000  1.414214
12  6.000000  2.236068

Or both:

i = 1
j = 2

new_dff=new_df.loc[i:j, i:j]
print (new_dff)
Id         1         2
Id                    
1   0.000000  4.123106
2   4.123106  0.000000

CodePudding user response:

The expected output is unclear.

You request to sort your dataframe's column using the row index 1.

However, when slicing the rows with new_dff = new_df[i:j], the row with index 1 is lost. Thus indexing fails and you get the error.

Do you want to subset the columns instead? new_dff = new_df.loc[:, i:j]

  • Related