I have this dataframe
Seller Product Price
1 A 10.00
2 A 5.0
3 A 7.50
4 A 3.75
What I want as Output:
Seller Product Price Quartil_1 Quartil_2 Quartil_3 Quartil_4 Rank_Price
1 A 10.00 2.5 5.0 7.5 10 4
2 A 5.0 2.5 5.0 7.5 10 2
3 A 7.50 2.5 5.0 7.5 10 3
4 A 3.35 2.5 5.0 7.5 10 1
So, my dataframe will have several sellers selling the same price for different prices. I need to calculate de 4 quartils of prices, considering all sellers for he same product and then classify each seller's price in this quartils, like the example above.
I know how to do it in python, using groupby, quartiles and rank but I am pretty new to Julia.
If someone can give a hand, please, i appreciate.
CodePudding user response:
Your input and output tables do not match. Also note that there are normally three quartiles defined Q1, Q2, and Q3 (see e.g. https://en.wikipedia.org/wiki/Quartile).
This is what I assume you want:
julia> using DataFrames, CategoricalArrays, DataFramesMeta
julia> df = DataFrame(seller=repeat(1:10, outer=2), product=repeat(["A", "B"], inner=10), price=1.5:1.0:20.5)
20×3 DataFrame
Row │ seller product price
│ Int64 String Float64
─────┼──────────────────────────
1 │ 1 A 1.5
2 │ 2 A 2.5
3 │ 3 A 3.5
4 │ 4 A 4.5
5 │ 5 A 5.5
6 │ 6 A 6.5
7 │ 7 A 7.5
8 │ 8 A 8.5
9 │ 9 A 9.5
10 │ 10 A 10.5
11 │ 1 B 11.5
12 │ 2 B 12.5
13 │ 3 B 13.5
14 │ 4 B 14.5
15 │ 5 B 15.5
16 │ 6 B 16.5
17 │ 7 B 17.5
18 │ 8 B 18.5
19 │ 9 B 19.5
20 │ 10 B 20.5
julia> @transform groupby(df, :product) begin
:quantile_1 = quantile(:price, 0.25)
:quantile_2 = quantile(:price, 0.5)
:quantile_3 = quantile(:price, 0.75)
:rank_price = levelcode.(cut(:price, 4))
end
20×7 DataFrame
Row │ seller product price quantile_1 quantile_2 quantile_3 rank_price
│ Int64 String Float64 Float64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 1 A 1.5 3.75 6.0 8.25 1
2 │ 2 A 2.5 3.75 6.0 8.25 1
3 │ 3 A 3.5 3.75 6.0 8.25 1
4 │ 4 A 4.5 3.75 6.0 8.25 2
5 │ 5 A 5.5 3.75 6.0 8.25 2
6 │ 6 A 6.5 3.75 6.0 8.25 3
7 │ 7 A 7.5 3.75 6.0 8.25 3
8 │ 8 A 8.5 3.75 6.0 8.25 4
9 │ 9 A 9.5 3.75 6.0 8.25 4
10 │ 10 A 10.5 3.75 6.0 8.25 4
11 │ 1 B 11.5 13.75 16.0 18.25 1
12 │ 2 B 12.5 13.75 16.0 18.25 1
13 │ 3 B 13.5 13.75 16.0 18.25 1
14 │ 4 B 14.5 13.75 16.0 18.25 2
15 │ 5 B 15.5 13.75 16.0 18.25 2
16 │ 6 B 16.5 13.75 16.0 18.25 3
17 │ 7 B 17.5 13.75 16.0 18.25 3
18 │ 8 B 18.5 13.75 16.0 18.25 4
19 │ 9 B 19.5 13.75 16.0 18.25 4
20 │ 10 B 20.5 13.75 16.0 18.25 4
or (maybe a bit more direct method but requiring an extra package)
julia> using StatsBase
julia> res = @transform groupby(df, :product) begin
:quantile_1 = quantile(:price, 0.25)
:quantile_2 = quantile(:price, 0.5)
:quantile_3 = quantile(:price, 0.75)
:rank_price = denserank(cut(:price, 4))
end
20×7 DataFrame
Row │ seller product price quantile_1 quantile_2 quantile_3 rank_price
│ Int64 String Float64 Float64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 1 A 1.5 3.75 6.0 8.25 1
2 │ 2 A 2.5 3.75 6.0 8.25 1
3 │ 3 A 3.5 3.75 6.0 8.25 1
4 │ 4 A 4.5 3.75 6.0 8.25 2
5 │ 5 A 5.5 3.75 6.0 8.25 2
6 │ 6 A 6.5 3.75 6.0 8.25 3
7 │ 7 A 7.5 3.75 6.0 8.25 3
8 │ 8 A 8.5 3.75 6.0 8.25 4
9 │ 9 A 9.5 3.75 6.0 8.25 4
10 │ 10 A 10.5 3.75 6.0 8.25 4
11 │ 1 B 11.5 13.75 16.0 18.25 1
12 │ 2 B 12.5 13.75 16.0 18.25 1
13 │ 3 B 13.5 13.75 16.0 18.25 1
14 │ 4 B 14.5 13.75 16.0 18.25 2
15 │ 5 B 15.5 13.75 16.0 18.25 2
16 │ 6 B 16.5 13.75 16.0 18.25 3
17 │ 7 B 17.5 13.75 16.0 18.25 3
18 │ 8 B 18.5 13.75 16.0 18.25 4
19 │ 9 B 19.5 13.75 16.0 18.25 4
20 │ 10 B 20.5 13.75 16.0 18.25 4
I am using DataFramesMeta.jl as then the code is probably a bit easier to read than vanilla DataFrames.jl code.