Home > Enterprise >  Skipping last N rows of Julia Dataframe
Skipping last N rows of Julia Dataframe

Time:12-03

I have a problem with removing last N rows from a Dataframe in Julia.

N_SKIP = 3

df = DataFrame(:col1=>1:10,:col2=>21:30)
N = nrow(df)

Original example Dataframe:

10×2 DataFrame
│ Row │ col1  │ col2  │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 21    │
│ 2   │ 2     │ 22    │
│ 3   │ 3     │ 23    │
│ 4   │ 4     │ 24    │
│ 5   │ 5     │ 25    │
│ 6   │ 6     │ 26    │
│ 7   │ 7     │ 27    │
│ 8   │ 8     │ 28    │
│ 9   │ 9     │ 29    │
│ 10  │ 10    │ 30    │

I want to get first N - N_SKIP rows, in this example rows with id in the 1:7 range.

Result I'm trying to achieve with N = 3:

7×2 DataFrame
│ Row │ col1  │ col2  │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 21    │
│ 2   │ 2     │ 22    │
│ 3   │ 3     │ 23    │
│ 4   │ 4     │ 24    │
│ 5   │ 5     │ 25    │
│ 6   │ 6     │ 26    │
│ 7   │ 7     │ 27    │

I could use first(df::AbstractDataFrame, n::Integer) and pass the remaining number of rows in the arguments. It works, but it doesn't seem correct.

julia> N_SKIP = 3
julia> df = DataFrame(:col1=>1:10,:col2=>21:30)
julia> N = nrow(df)
julia> first(df,N - N_SKIP)
7×2 DataFrame
│ Row │ col1  │ col2  │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 21    │
│ 2   │ 2     │ 22    │
│ 3   │ 3     │ 23    │
│ 4   │ 4     │ 24    │
│ 5   │ 5     │ 25    │
│ 6   │ 6     │ 26    │
│ 7   │ 7     │ 27    │

CodePudding user response:

There are three ways you could want to do it (depending on what you want).

  1. Create a new data frame:
julia> df[1:end-3, :]
7×2 DataFrame
 Row │ col1   col2
     │ Int64  Int64
─────┼──────────────
   1 │     1     21
   2 │     2     22
   3 │     3     23
   4 │     4     24
   5 │     5     25
   6 │     6     26
   7 │     7     27

julia> first(df, nrow(df) - 3)
7×2 DataFrame
 Row │ col1   col2
     │ Int64  Int64
─────┼──────────────
   1 │     1     21
   2 │     2     22
   3 │     3     23
   4 │     4     24
   5 │     5     25
   6 │     6     26
   7 │     7     27
  1. Create a view of a data frame:
julia> first(df, nrow(df) - 3, view=true)
7×2 SubDataFrame
 Row │ col1   col2
     │ Int64  Int64
─────┼──────────────
   1 │     1     21
   2 │     2     22
   3 │     3     23
   4 │     4     24
   5 │     5     25
   6 │     6     26
   7 │     7     27

julia> @view df[1:end-3, :]
7×2 SubDataFrame
 Row │ col1   col2
     │ Int64  Int64
─────┼──────────────
   1 │     1     21
   2 │     2     22
   3 │     3     23
   4 │     4     24
   5 │     5     25
   6 │     6     26
   7 │     7     27
  1. Update the source data frame in place (alternatively deleteat! could be used depending on what is more convenient for you):
julia> keepat!(df, 1:nrow(df)-3)
7×2 DataFrame
 Row │ col1   col2
     │ Int64  Int64
─────┼──────────────
   1 │     1     21
   2 │     2     22
   3 │     3     23
   4 │     4     24
   5 │     5     25
   6 │     6     26
   7 │     7     27
  • Related