I have a Julia DataFrame
using DataFrames
df = DataFrame(a = [1,1,1,2,2,2,2], b = 1:7)
7×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 1 2
3 │ 1 3
4 │ 2 4
5 │ 2 5
6 │ 2 6
7 │ 2 7
and want to create a new column that contains the row number per group. It should look like this
7×2 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼──────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3
4 │ 2 4 4
5 │ 2 5 1
6 │ 2 6 2
7 │ 2 7 3
I am open to any solution, but I am especially looking for a DataFramesMeta
solution that works out nicely together with the Chain
package. R
's dplyr
has a simple function named n()
that is doing this. I feel like there must be something similar in Julia
CodePudding user response:
Do:
julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(a = [1,1,1,2,2,2,2], b = 1:7)
7×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 1 2
3 │ 1 3
4 │ 2 4
5 │ 2 5
6 │ 2 6
7 │ 2 7
julia> @chain df begin
groupby(:a)
@transform(:c = eachindex(:b))
end
7×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 1 1
2 │ 1 2 2
3 │ 1 3 3
4 │ 2 4 1
5 │ 2 5 2
6 │ 2 6 3
7 │ 2 7 4
In upcoming DataFrames.jl 1.4 release it will be even simpler, see https://github.com/JuliaData/DataFrames.jl/pull/3001.
(the difference is that you will not have to pass the column name as :b
in this case but write :c = $eachindex
)