How can I get the nth largest value in Julia dataframe?-CodePudding

I am looking for a solution to find out nth largest data in my Julia dataframe, something like ,pd.Series.nlargest(n= 5, keep='first') in Python.

In more detail, let's say I have Julia dataframe, such as ;

df = DataFrame(Data1 = rand(5), Data2 = rand(5));

    Data1       Data2
    Float64     Float64
1   0.125824    0.841358
2   0.612905    0.337965
3   0.210736    0.66849
4   0.172203    0.377226
5   0.898269    0.448477

How can I get the nth largest value from column name Data1?

If n =3, below is my expected output.

5   0.898269
2   0.612905
3   0.210736

CodePudding user response：

Here is an efficient way to do it. First, to subset rows of a data frame:

julia> df = DataFrame(Data1 = rand(10), Data2 = rand(10));

julia> df[partialsortperm(df.Data1, 1:3, rev=true), :] # if you need a data frame with top 3 rows
3×2 DataFrame
 Row │ Data1     Data2
     │ Float64   Float64
─────┼────────────────────
   1 │ 0.959456  0.628431
   2 │ 0.856696  0.144034
   3 │ 0.824744  0.996384

julia> df[partialsortperm(df.Data1, 3, rev=true), :] # if you need only the 3-rd row
DataFrameRow
 Row │ Data1     Data2
     │ Float64   Float64
─────┼────────────────────
   4 │ 0.824744  0.996384

Both operations are efficient. The partialsort operation does a minimal amount of work to get the resulting the required values.

If you did not want to get all rows of the data frame, but only part of the single column then the following would be enough:

julia> partialsort(df.Data1, 1:3, rev=true) # top 3 values
3-element view(::Vector{Float64}, 1:3) with eltype Float64:
 0.959456038630526
 0.856695598334831
 0.8247444664227905

julia> partialsort(df.Data1, 3, rev=true) # 3-rd value
0.8247444664227905