Home > other >  Julia @Subset Dates
Julia @Subset Dates

Time:09-17

This should be an easy one but I can't find any documentation or prior Q&A on this. Using Julia to subset is easy especially with the @Chain command. But I haven't for the life of me figured out a way to subset on a date:

maindf = @chain rawdf begin
    @subset(Dates.year(:travel_date) .== 2019)
end

In all of the documentation Dates.year(today()) should produce (2021) but this ends up tossing me an error:

ERROR: MethodError: no method matching  (::Vector{Date}, ::Int64)
Closest candidates are:
   (::Any, ::Any, ::Any, ::Any...) at operators.jl:560
   (::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:87
   (::T, ::Integer) where T<:AbstractChar at char.jl:223

Not sure exactly why I am getting a method error..

In R using DPLYR this would simply be:

maindf = rawdf %>% 
filter(., year(travel_date) == 2019)

Any ideas?

CodePudding user response:

Use:

julia> using DataFramesMeta, Dates

julia> df = DataFrame(travel_date=repeat([Date(2019,1,1), Date(2020,1,1)],3), id=1:6)
6×2 DataFrame
 Row  travel_date  id
      Date         Int64
─────┼────────────────────
   1  2019-01-01       1
   2  2020-01-01       2
   3  2019-01-01       3
   4  2020-01-01       4
   5  2019-01-01       5
   6  2020-01-01       6

julia> @rsubset(df, year(:travel_date) == 2019)
3×2 DataFrame
 Row  travel_date  id
      Date         Int64
─────┼────────────────────
   1  2019-01-01       1
   2  2019-01-01       3
   3  2019-01-01       5

julia> @subset(df, year.(:travel_date) .== 2019)
3×2 DataFrame
 Row  travel_date  id
      Date         Int64
─────┼────────────────────
   1  2019-01-01       1
   2  2019-01-01       3
   3  2019-01-01       5

The difference is that @rsubset works by row and @subset works on whole columns.

Your problem was that in Dates.year(:travel_date) .== 2019) you mix non-broadcasted call of the year function and broadcasted comparison .== 2019. You always need to make sure that you either work row-wise (using @rsubset in this case) or on whole columns (using @subset).

Different scenarios might require a different approach. Here is an example when whole-column approach is useful:

julia> using Statistics

julia> @subset(df, :id .> mean(:id))
3×2 DataFrame
 Row │ travel_date  id
     │ Date         Int64
─────┼────────────────────
   12020-01-01       4
   22019-01-01       5
   32020-01-01       6

where you want mean to operate on a whole column.

EDIT

Here is the same with @chain:

julia> @chain df begin
           @subset year.(:travel_date) .== 2019
       end
3×2 DataFrame
 Row │ travel_date  id
     │ Date         Int64
─────┼────────────────────
   12019-01-01       1
   22019-01-01       3
   32019-01-01       5
  • Related