I have a problem with removing last N rows from a Dataframe in Julia.
N_SKIP = 3
df = DataFrame(:col1=>1:10,:col2=>21:30)
N = nrow(df)
Original example Dataframe:
10×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 21 │
│ 2 │ 2 │ 22 │
│ 3 │ 3 │ 23 │
│ 4 │ 4 │ 24 │
│ 5 │ 5 │ 25 │
│ 6 │ 6 │ 26 │
│ 7 │ 7 │ 27 │
│ 8 │ 8 │ 28 │
│ 9 │ 9 │ 29 │
│ 10 │ 10 │ 30 │
I want to get first N - N_SKIP
rows, in this example rows with id in the 1:7 range.
Result I'm trying to achieve with N = 3
:
7×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 21 │
│ 2 │ 2 │ 22 │
│ 3 │ 3 │ 23 │
│ 4 │ 4 │ 24 │
│ 5 │ 5 │ 25 │
│ 6 │ 6 │ 26 │
│ 7 │ 7 │ 27 │
I could use first(df::AbstractDataFrame, n::Integer)
and pass the remaining number of rows in the arguments. It works, but it doesn't seem correct.
julia> N_SKIP = 3
julia> df = DataFrame(:col1=>1:10,:col2=>21:30)
julia> N = nrow(df)
julia> first(df,N - N_SKIP)
7×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 21 │
│ 2 │ 2 │ 22 │
│ 3 │ 3 │ 23 │
│ 4 │ 4 │ 24 │
│ 5 │ 5 │ 25 │
│ 6 │ 6 │ 26 │
│ 7 │ 7 │ 27 │
CodePudding user response:
There are three ways you could want to do it (depending on what you want).
- Create a new data frame:
julia> df[1:end-3, :]
7×2 DataFrame
Row │ col1 col2
│ Int64 Int64
─────┼──────────────
1 │ 1 21
2 │ 2 22
3 │ 3 23
4 │ 4 24
5 │ 5 25
6 │ 6 26
7 │ 7 27
julia> first(df, nrow(df) - 3)
7×2 DataFrame
Row │ col1 col2
│ Int64 Int64
─────┼──────────────
1 │ 1 21
2 │ 2 22
3 │ 3 23
4 │ 4 24
5 │ 5 25
6 │ 6 26
7 │ 7 27
- Create a view of a data frame:
julia> first(df, nrow(df) - 3, view=true)
7×2 SubDataFrame
Row │ col1 col2
│ Int64 Int64
─────┼──────────────
1 │ 1 21
2 │ 2 22
3 │ 3 23
4 │ 4 24
5 │ 5 25
6 │ 6 26
7 │ 7 27
julia> @view df[1:end-3, :]
7×2 SubDataFrame
Row │ col1 col2
│ Int64 Int64
─────┼──────────────
1 │ 1 21
2 │ 2 22
3 │ 3 23
4 │ 4 24
5 │ 5 25
6 │ 6 26
7 │ 7 27
- Update the source data frame in place (alternatively
deleteat!
could be used depending on what is more convenient for you):
julia> keepat!(df, 1:nrow(df)-3)
7×2 DataFrame
Row │ col1 col2
│ Int64 Int64
─────┼──────────────
1 │ 1 21
2 │ 2 22
3 │ 3 23
4 │ 4 24
5 │ 5 25
6 │ 6 26
7 │ 7 27