This should be an easy one but I can't find any documentation or prior Q&A on this. Using Julia to subset is easy especially with the @Chain command. But I haven't for the life of me figured out a way to subset on a date:
maindf = @chain rawdf begin
@subset(Dates.year(:travel_date) .== 2019)
end
In all of the documentation Dates.year(today()) should produce (2021) but this ends up tossing me an error:
ERROR: MethodError: no method matching (::Vector{Date}, ::Int64)
Closest candidates are:
(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
(::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:87
(::T, ::Integer) where T<:AbstractChar at char.jl:223
Not sure exactly why I am getting a method error..
In R using DPLYR this would simply be:
maindf = rawdf %>%
filter(., year(travel_date) == 2019)
Any ideas?
CodePudding user response:
Use:
julia> using DataFramesMeta, Dates
julia> df = DataFrame(travel_date=repeat([Date(2019,1,1), Date(2020,1,1)],3), id=1:6)
6×2 DataFrame
Row │ travel_date id
│ Date Int64
─────┼────────────────────
1 │ 2019-01-01 1
2 │ 2020-01-01 2
3 │ 2019-01-01 3
4 │ 2020-01-01 4
5 │ 2019-01-01 5
6 │ 2020-01-01 6
julia> @rsubset(df, year(:travel_date) == 2019)
3×2 DataFrame
Row │ travel_date id
│ Date Int64
─────┼────────────────────
1 │ 2019-01-01 1
2 │ 2019-01-01 3
3 │ 2019-01-01 5
julia> @subset(df, year.(:travel_date) .== 2019)
3×2 DataFrame
Row │ travel_date id
│ Date Int64
─────┼────────────────────
1 │ 2019-01-01 1
2 │ 2019-01-01 3
3 │ 2019-01-01 5
The difference is that @rsubset
works by row and @subset
works on whole columns.
Your problem was that in Dates.year(:travel_date) .== 2019)
you mix non-broadcasted call of the year
function and broadcasted comparison .== 2019
. You always need to make sure that you either work row-wise (using @rsubset
in this case) or on whole columns (using @subset
).
Different scenarios might require a different approach. Here is an example when whole-column approach is useful:
julia> using Statistics
julia> @subset(df, :id .> mean(:id))
3×2 DataFrame
Row │ travel_date id
│ Date Int64
─────┼────────────────────
1 │ 2020-01-01 4
2 │ 2019-01-01 5
3 │ 2020-01-01 6
where you want mean
to operate on a whole column.
EDIT
Here is the same with @chain
:
julia> @chain df begin
@subset year.(:travel_date) .== 2019
end
3×2 DataFrame
Row │ travel_date id
│ Date Int64
─────┼────────────────────
1 │ 2019-01-01 1
2 │ 2019-01-01 3
3 │ 2019-01-01 5